How do the AI Act and GDPR work together for AI training data?

The AI Act and GDPR are cumulative obligations. An organisation that meets Article 10 but breaches GDPR is non-compliant with both, because the AI Act presupposes GDPR compliance.

Which lawful basis applies to AI training data under the GDPR?

Consent, legitimate interest, public interest, and contract performance are available under GDPR Article 6. Legitimate interest is most commonly relied upon but requires a three-part balancing test for each dataset.

Can an organisation rely on legitimate interest as a blanket justification for AI training data?

No. The legitimate interest balancing test must be conducted for each dataset and each processing purpose, considering the nature of data, data subject expectations, impact, and safeguards.

Does the AI Act's ten-year retention requirement override GDPR storage limitation?

No. The documentation must be retained for ten years, but the underlying personal data can be deleted or anonymised once no longer necessary. Metadata and provenance records are retained instead.

Can an organisation rely on legitimate interest as a blanket justification for AI training data?

No. The legitimate interest balancing test must be conducted for each dataset and each processing purpose, considering the nature of data, data subject expectations, impact, and safeguards.

Does the AI Act's ten-year retention requirement override GDPR storage limitation?

No. The documentation must be retained for ten years, but the underlying personal data can be deleted or anonymised once no longer necessary. Metadata and provenance records are retained instead.

Organisations building high-risk AI systems must satisfy both the EU AI Act's data governance requirements under Article 10 and the full scope of GDPR obligations simultaneously. These are cumulative rather than alternative frameworks. The selection of a lawful basis under GDPR Article 6 for AI training data processing carries significant operational implications, with legitimate interest and consent being the most common bases. Data subjects retain all GDPR rights including access, rectification, erasure, and objection with respect to data used in AI training and inference. The right to erasure presents particular technical challenges, as removing an individual's influence from a trained model may require full retraining, sharded training architectures, or approximate unlearning methods. GDPR Article 22 on automated decision-making intersects directly with the AI Act's Article 14 human oversight requirements, where the level of human involvement determines whether solely automated processing restrictions apply. Organisations must coordinate their Data Protection Impact Assessments under GDPR Article 35 with Fundamental Rights Impact Assessments under AI Act Article 27. The GDPR storage limitation principle and the AI Act's ten-year documentation retention obligation require careful reconciliation through metadata preservation after personal data deletion.

Regulatory Requirement

The AI Act's data governance requirements and the GDPR are cumulative obligations, not alternative frameworks.

The AI Act's data governance requirements and the GDPR are cumulative obligations, not alternative frameworks. An organisation that meets Article 10 of the AI Act but breaches the GDPR is non-compliant with both regulations, because the AI Act's data governance provisions presuppose GDPR compliance. Module 4 of the ai system description portfolio must address both frameworks in an integrated manner, documenting how each GDPR obligation is satisfied alongside each AI Act requirement.

This dual compliance structure means that organisations cannot treat data protection as a separate workstream from AI governance. The lawful basis for processing, data subject rights, impact assessments, and retention policies all require coordinated treatment within the AISDP. Data Governance for High-Risk AI Systems covers the broader data governance framework within which GDPR alignment sits.

Regulatory Requirement

Selecting a lawful basis under GDPR Article 6 for processing personal data in AI training is one of the most consequential data governance decisions an organisation will make.

Selecting a lawful basis under GDPR Article 6 for processing personal data in AI training is one of the most consequential data governance decisions an organisation will make. The available bases carry different implications for operational flexibility, data subject relationships, and long-term compliance burden.

Consent (Article 6(1)(a)) offers the strongest legal footing but is frequently impractical for large-scale training datasets. Consent must be freely given, specific, informed, and unambiguous. For training data, data subjects must be told their data will be used for AI model training, what the model's purpose is, and how the model may affect them or others. Consent can be withdrawn at any time, creating an operational challenge: the organisation must be able to remove data from the training set and either retrain the model or demonstrate that the individual's data cannot be recovered from the model's parameters.

Legitimate interest (Article 6(1)(f)) is more commonly relied upon for AI training. It requires a three-part balancing test. The organisation identifies a legitimate interest, such as improving the fairness, accuracy, or safety of a high risk ai system. It demonstrates that processing is necessary to achieve that interest. It then balances the interest against the data subjects' rights and freedoms, considering the nature of the data, the expectations of data subjects, the impact on them, and the safeguards in place. The Legal and Regulatory Advisor documents the balancing test as a Module 4 artefact. Organisations should not treat legitimate interest as a blanket justification; the test must be conducted for each dataset and each processing purpose.

Public interest (Article 6(1)(e)) is available for AI systems operated by or on behalf of public authorities, where processing is necessary for a task carried out in the public interest. The scope of "public interest" is defined by member state law.

Contract performance (Article 6(1)(b)) is relevant where the AI system processes personal data to fulfil a contract with the data subject, for example a personalised service. Processing must be genuinely necessary for contract performance, not merely useful. The DPO Liaison documents the selected lawful basis in the AISDP together with the supporting analysis, whether consent records, legitimate interest assessment, or statutory basis reference.

What data subject rights apply to AI training and inference?

Regulatory Requirement

Data subjects retain all GDPR rights with respect to personal data used in AI systems, but the practical exercise of these rights in the context of trained models presents specific technical challenges that the AISDP must address.

The right of access (Article 15) allows data subjects to request confirmation of whether their data was used in training and, if so, to obtain a copy. The organisation must be able to identify whether a specific individual's data is in the training dataset, which requires maintaining a record of training data provenance, and provide a copy of that data.

The right to rectification (Article 16) requires the organisation to correct inaccurate data in the training set. Depending on the materiality of the correction, this may require dataset modification and model retraining.

The right to erasure (Article 17) is the most technically challenging right in the AI context. If a data subject requests erasure, the organisation must remove their data from the training dataset and either retrain the model without it or demonstrate that the data cannot be recovered from the model's parameters. For large neural networks, demonstrating that an individual's data has no residual influence on the model's behaviour is technically difficult. Current techniques such as machine unlearning are still maturing. The AISDP must document the organisation's technical approach to erasure requests, including any limitations and the compensating controls where full erasure is not technically feasible.

How does automated decision-making intersect with AI Act human oversight?

Regulatory Requirement

The right not to be subject to solely automated decision-making (GDPR Article 22) intersects directly with the AI Act's Article 14 human oversight requirements.

The right not to be subject to solely automated decision-making (GDPR Article 22) intersects directly with the AI Act's Article 14 human oversight requirements. Where an AI system makes decisions that produce legal effects or similarly significantly affect data subjects, and the decision is "solely" automated without meaningful human involvement, data subjects have the right not to be subject to such decisions.

The critical word is "solely." If the human oversight measures documented in AISDP Module 7 constitute meaningful human involvement, the processing may not be "solely" automated, and the Article 22 right may not be triggered. Meaningful involvement requires that the operator independently reviews the system's recommendation, has the information to form their own judgement, and has the authority to override. Module 4 records the analysis of whether Article 22 applies and, if it does, the safeguards in place: the right to obtain human intervention, to express a point of view, and to contest the decision.

Article 86 of the AI Act provides affected persons with a right to an explanation of individual decision-making by high-risk AI systems. This complements the GDPR's Recital 71 reference to the right to obtain an explanation of an automated decision. The AISDP documents how individual explanations are generated, their format, their fidelity to the model's actual reasoning, and the channels through which affected persons can request them. Human Oversight Requirements covers the broader human oversight framework that determines whether Article 22 is triggered.

Why must the DPIA and FRIA be coordinated?

Regulatory Requirement

GDPR Article 35 requires a Data Protection Impact Assessment for processing likely to result in a high risk to individuals, while AI Act Article 27 requires a Fundamental Rights Impact Assessment for deployers of high-risk systems.

The DPIA focuses on data protection risks: risks to the confidentiality, integrity, and availability of personal data, and risks to data subjects' rights and freedoms arising from processing. The FRIA addresses a broader set of fundamental rights, including non-discrimination, freedom of expression, and the right to an effective remedy.

Findings from the DPIA should inform the FRIA, because data protection risks are a subset of fundamental rights risks. Findings from the FRIA should inform the DPIA, because a fairness concern identified in the FRIA may carry data protection implications. Organisations should ensure that the teams conducting each assessment share their findings and that any mitigation measures proposed in one assessment are reflected in the other.

The AISDP documents how the two assessments are coordinated, cross-references their findings, and confirms that both remain current. Fundamental Rights Impact Assessment provides detailed guidance on FRIA methodology and scope.

Regulatory Requirement

The storage limitation principle under GDPR Article 5(1)(e) requires that personal data be kept for no longer than necessary.

The storage limitation principle under GDPR Article 5(1)(e) requires that personal data be kept for no longer than necessary. For AI systems, this creates tensions with the need to retain training data for reproducibility, retraining, and audit purposes.

Module 4 of the AISDP records the retention period for each category of personal data. The categories include training data, validation data, test data, inference inputs, inference outputs, and operator interaction logs. For each category, the AISDP records the justification for the retention period, whether regulatory requirement, reproducibility need, audit trail obligation, or retraining schedule. It also records the deletion or anonymisation process applied at the end of each retention period.

The ten-year AISDP retention obligation under Article 18 does not override the GDPR's storage limitation. It means the documentation is retained, not necessarily the underlying personal data. Organisations must reconcile these obligations, potentially retaining documentation about the data (metadata, provenance records, quality metrics) after the data itself has been deleted or anonymised.

At system end-of-life, the data retention framework faces its most demanding test. The DPO Liaison should review the data retention schedule against the specific circumstances of the decommission to determine which data categories require deletion, which require anonymisation, and which are retained as documentation.

How can organisations handle the right to erasure in trained models?

Compensating Controls

Three technical approaches exist for addressing erasure requests against trained models, each with different cost and assurance profiles.

Full retraining is the cleanest approach: remove the individual's records from the training dataset, retrain the model from scratch on the reduced dataset, and replace the deployed model. This provides clear compliance but is expensive for large models, where training costs range from thousands to millions of euros per run. Dataset versioning tools such as DVC or Delta Lake time-travel make the dataset manipulation straightforward; the cost lies in the retraining itself.

SISA training (Sharded, Isolated, Sliced, Aggregated) is an architectural approach that makes retraining more efficient. Training data is partitioned into shards, and a separate model is trained on each shard. The final prediction aggregates the shard models' outputs. When a data subject requests erasure, only the shard containing their data needs retraining, reducing compute cost proportionally. The trade-off is that the sharded architecture may achieve slightly lower accuracy than a model trained on the full dataset, and the sharding infrastructure must be in place from the start. SISA is not widely supported in production ML frameworks and requires custom engineering.

Approximate unlearning methods attempt to remove the influence of specific training records without full retraining, typically by computing gradient updates that reverse the effect of the removed data. These methods are still in the research stage and do not provide formal guarantees that the data's influence has been fully removed. For high-risk systems under the AI Act, the Technical SME treats approximate unlearning as a supplementary control, not a primary erasure mechanism.

Engineering Approach

The right of access under GDPR Article 15 requires the organisation to confirm whether a specific individual's data was used in training and provide a copy if so.

The right of access under GDPR Article 15 requires the organisation to confirm whether a specific individual's data was used in training and provide a copy if so. This requires a training data provenance index: a searchable record mapping individual identifiers to the dataset versions in which they appear. The data engineering team maintains the index as part of the data versioning infrastructure. When a data subject makes an access request, the index is queried to determine which dataset versions contain their records, and those records are retrieved from versioned storage. Tools such as BigID and OneTrust can automate the discovery and retrieval workflow for structured data sources.

The data retention tension between GDPR and the AI Act requires specific architectural attention. Under the storage limitation principle, personal data must be kept no longer than necessary. Under Article 18, technical documentation including information about training data must be retained for ten years. Reconciling these requirements means retaining metadata about the training data, including provenance records, quality metrics, distributional statistics, and versioning records, after the personal data itself has been deleted or anonymised. Data architecture must be designed so that compliance-relevant information about training data can survive the deletion of the individual records it describes.

For automated decision-making, the level of human involvement determines whether GDPR Article 22 is triggered. The DPO Liaison documents the analysis in Module 4, specifying the level of human involvement, the criteria for meaningful involvement beyond mere rubber-stamping, and the safeguards in place if Article 22 does apply: the right to human intervention, to express a point of view, and to contest the decision.

GDPR Alignment for AI Training Data and Inference

Written by

What data subject rights apply to AI training and inference?

How does automated decision-making intersect with AI Act human oversight?

Why must the DPIA and FRIA be coordinated?

How can organisations handle the right to erasure in trained models?

Frequently Asked Questions

Related Pages

In This Section

Navigate the regulatory landscape

How do the AI Act and GDPR work together for AI training data?

Which lawful basis applies to AI training data under the GDPR?

What data subject rights apply to AI training and inference?

How does automated decision-making intersect with AI Act human oversight?

Why must the DPIA and FRIA be coordinated?

What does the GDPR's storage limitation principle mean for AI training data?

How can organisations handle the right to erasure in trained models?

What technical infrastructure supports GDPR compliance for AI systems?

Frequently Asked Questions

Related Pages

In This Section

Navigate the regulatory landscape