We use cookies to improve your experience and analyse site traffic.
The EU AI Act and GDPR impose cumulative data governance obligations on organisations operating high-risk AI systems. Article 10 of the AI Act presupposes GDPR compliance, requiring integrated treatment of lawful basis, data subject rights, impact assessments, and retention within the AISDP.
The AI Act's data governance requirements and the GDPR are cumulative obligations, not alternative frameworks.
The AI Act's data governance requirements and the GDPR are cumulative obligations, not alternative frameworks. An organisation that meets Article 10 of the AI Act but breaches the GDPR is non-compliant with both regulations, because the AI Act's data governance provisions presuppose GDPR compliance. Module 4 of the ai system description portfolio must address both frameworks in an integrated manner, documenting how each GDPR obligation is satisfied alongside each AI Act requirement.
This dual compliance structure means that organisations cannot treat data protection as a separate workstream from AI governance. The lawful basis for processing, data subject rights, impact assessments, and retention policies all require coordinated treatment within the AISDP. Data Governance for High-Risk AI Systems covers the broader data governance framework within which GDPR alignment sits.
Selecting a lawful basis under GDPR Article 6 for processing personal data in AI training is one of the most consequential data governance decisions an organisation will make.
Selecting a lawful basis under GDPR Article 6 for processing personal data in AI training is one of the most consequential data governance decisions an organisation will make. The available bases carry different implications for operational flexibility, data subject relationships, and long-term compliance burden.
Consent (Article 6(1)(a)) offers the strongest legal footing but is frequently impractical for large-scale training datasets. Consent must be freely given, specific, informed, and unambiguous. For training data, data subjects must be told their data will be used for AI model training, what the model's purpose is, and how the model may affect them or others. Consent can be withdrawn at any time, creating an operational challenge: the organisation must be able to remove data from the training set and either retrain the model or demonstrate that the individual's data cannot be recovered from the model's parameters.
Legitimate interest (Article 6(1)(f)) is more commonly relied upon for AI training. It requires a three-part balancing test. The organisation identifies a legitimate interest, such as improving the fairness, accuracy, or safety of a high risk ai system. It demonstrates that processing is necessary to achieve that interest. It then balances the interest against the data subjects' rights and freedoms, considering the nature of the data, the expectations of data subjects, the impact on them, and the safeguards in place. The Legal and Regulatory Advisor documents the balancing test as a Module 4 artefact. Organisations should not treat legitimate interest as a blanket justification; the test must be conducted for each dataset and each processing purpose.
Public interest (Article 6(1)(e)) is available for AI systems operated by or on behalf of public authorities, where processing is necessary for a task carried out in the public interest. The scope of "public interest" is defined by member state law.
Contract performance (Article 6(1)(b)) is relevant where the AI system processes personal data to fulfil a contract with the data subject, for example a personalised service. Processing must be genuinely necessary for contract performance, not merely useful. The DPO Liaison documents the selected lawful basis in the AISDP together with the supporting analysis, whether consent records, legitimate interest assessment, or statutory basis reference.
Data subjects retain all GDPR rights with respect to personal data used in AI systems, but the practical exercise of these rights in the context of trained models presents specific technical challenges that the AISDP must address.
Data subjects retain all GDPR rights with respect to personal data used in AI systems, but the practical exercise of these rights in the context of trained models presents specific technical challenges that the AISDP must address.
The right of access (Article 15) allows data subjects to request confirmation of whether their data was used in training and, if so, to obtain a copy. The organisation must be able to identify whether a specific individual's data is in the training dataset, which requires maintaining a record of training data provenance, and provide a copy of that data.
The right to rectification (Article 16) requires the organisation to correct inaccurate data in the training set. Depending on the materiality of the correction, this may require dataset modification and model retraining.
The right to erasure (Article 17) is the most technically challenging right in the AI context. If a data subject requests erasure, the organisation must remove their data from the training dataset and either retrain the model without it or demonstrate that the data cannot be recovered from the model's parameters. For large neural networks, demonstrating that an individual's data has no residual influence on the model's behaviour is technically difficult. Current techniques such as machine unlearning are still maturing. The AISDP must document the organisation's technical approach to erasure requests, including any limitations and the compensating controls where full erasure is not technically feasible.
The right not to be subject to solely automated decision-making (GDPR Article 22) intersects directly with the AI Act's Article 14 human oversight requirements.
The right not to be subject to solely automated decision-making (GDPR Article 22) intersects directly with the AI Act's Article 14 human oversight requirements. Where an AI system makes decisions that produce legal effects or similarly significantly affect data subjects, and the decision is "solely" automated without meaningful human involvement, data subjects have the right not to be subject to such decisions.
The critical word is "solely." If the human oversight measures documented in AISDP Module 7 constitute meaningful human involvement, the processing may not be "solely" automated, and the Article 22 right may not be triggered. Meaningful involvement requires that the operator independently reviews the system's recommendation, has the information to form their own judgement, and has the authority to override. Module 4 records the analysis of whether Article 22 applies and, if it does, the safeguards in place: the right to obtain human intervention, to express a point of view, and to contest the decision.
Article 86 of the AI Act provides affected persons with a right to an explanation of individual decision-making by high-risk AI systems. This complements the GDPR's Recital 71 reference to the right to obtain an explanation of an automated decision. The AISDP documents how individual explanations are generated, their format, their fidelity to the model's actual reasoning, and the channels through which affected persons can request them. Human Oversight Requirements covers the broader human oversight framework that determines whether Article 22 is triggered.
GDPR Article 35 requires a Data Protection Impact Assessment for processing likely to result in a high risk to individuals, while AI Act Article 27 requires a Fundamental Rights Impact Assessment for deployers of high-risk systems.
GDPR Article 35 requires a Data Protection Impact Assessment for processing likely to result in a high risk to individuals, while AI Act Article 27 requires a Fundamental Rights Impact Assessment for deployers of high-risk systems. These are distinct but overlapping obligations that benefit from coordinated execution.
The DPIA focuses on data protection risks: risks to the confidentiality, integrity, and availability of personal data, and risks to data subjects' rights and freedoms arising from processing. The FRIA addresses a broader set of fundamental rights, including non-discrimination, freedom of expression, and the right to an effective remedy.
Findings from the DPIA should inform the FRIA, because data protection risks are a subset of fundamental rights risks. Findings from the FRIA should inform the DPIA, because a fairness concern identified in the FRIA may carry data protection implications. Organisations should ensure that the teams conducting each assessment share their findings and that any mitigation measures proposed in one assessment are reflected in the other.
The AISDP documents how the two assessments are coordinated, cross-references their findings, and confirms that both remain current. Fundamental Rights Impact Assessment provides detailed guidance on FRIA methodology and scope.
The storage limitation principle under GDPR Article 5(1)(e) requires that personal data be kept for no longer than necessary.
The storage limitation principle under GDPR Article 5(1)(e) requires that personal data be kept for no longer than necessary. For AI systems, this creates tensions with the need to retain training data for reproducibility, retraining, and audit purposes.
Module 4 of the AISDP records the retention period for each category of personal data. The categories include training data, validation data, test data, inference inputs, inference outputs, and operator interaction logs. For each category, the AISDP records the justification for the retention period, whether regulatory requirement, reproducibility need, audit trail obligation, or retraining schedule. It also records the deletion or anonymisation process applied at the end of each retention period.
The ten-year AISDP retention obligation under Article 18 does not override the GDPR's storage limitation. It means the documentation is retained, not necessarily the underlying personal data. Organisations must reconcile these obligations, potentially retaining documentation about the data (metadata, provenance records, quality metrics) after the data itself has been deleted or anonymised.
At system end-of-life, the data retention framework faces its most demanding test. The DPO Liaison should review the data retention schedule against the specific circumstances of the decommission to determine which data categories require deletion, which require anonymisation, and which are retained as documentation.
Three technical approaches exist for addressing erasure requests against trained models, each with different cost and assurance profiles.
Three technical approaches exist for addressing erasure requests against trained models, each with different cost and assurance profiles.
Full retraining is the cleanest approach: remove the individual's records from the training dataset, retrain the model from scratch on the reduced dataset, and replace the deployed model. This provides clear compliance but is expensive for large models, where training costs range from thousands to millions of euros per run. Dataset versioning tools such as DVC or Delta Lake time-travel make the dataset manipulation straightforward; the cost lies in the retraining itself.
SISA training (Sharded, Isolated, Sliced, Aggregated) is an architectural approach that makes retraining more efficient. Training data is partitioned into shards, and a separate model is trained on each shard. The final prediction aggregates the shard models' outputs. When a data subject requests erasure, only the shard containing their data needs retraining, reducing compute cost proportionally. The trade-off is that the sharded architecture may achieve slightly lower accuracy than a model trained on the full dataset, and the sharding infrastructure must be in place from the start. SISA is not widely supported in production ML frameworks and requires custom engineering.
Approximate unlearning methods attempt to remove the influence of specific training records without full retraining, typically by computing gradient updates that reverse the effect of the removed data. These methods are still in the research stage and do not provide formal guarantees that the data's influence has been fully removed. For high-risk systems under the AI Act, the Technical SME treats approximate unlearning as a supplementary control, not a primary erasure mechanism.
The right of access under GDPR Article 15 requires the organisation to confirm whether a specific individual's data was used in training and provide a copy if so.
The right of access under GDPR Article 15 requires the organisation to confirm whether a specific individual's data was used in training and provide a copy if so. This requires a training data provenance index: a searchable record mapping individual identifiers to the dataset versions in which they appear. The data engineering team maintains the index as part of the data versioning infrastructure. When a data subject makes an access request, the index is queried to determine which dataset versions contain their records, and those records are retrieved from versioned storage. Tools such as BigID and OneTrust can automate the discovery and retrieval workflow for structured data sources.
The data retention tension between GDPR and the AI Act requires specific architectural attention. Under the storage limitation principle, personal data must be kept no longer than necessary. Under Article 18, technical documentation including information about training data must be retained for ten years. Reconciling these requirements means retaining metadata about the training data, including provenance records, quality metrics, distributional statistics, and versioning records, after the personal data itself has been deleted or anonymised. Data architecture must be designed so that compliance-relevant information about training data can survive the deletion of the individual records it describes.
For automated decision-making, the level of human involvement determines whether GDPR Article 22 is triggered. The DPO Liaison documents the analysis in Module 4, specifying the level of human involvement, the criteria for meaningful involvement beyond mere rubber-stamping, and the safeguards in place if Article 22 does apply: the right to human intervention, to express a point of view, and to contest the decision.
No. The legitimate interest balancing test must be conducted for each dataset and each processing purpose, considering the nature of data, data subject expectations, impact, and safeguards.
The organisation must remove their data from the training set and either retrain the model or demonstrate that the individual's data cannot be recovered from the model's parameters.
No. The documentation must be retained for ten years, but the underlying personal data can be deleted or anonymised once no longer necessary. Metadata and provenance records are retained instead.
All GDPR rights apply, including access, rectification, erasure, and objection. The right to erasure is the most technically challenging, potentially requiring model retraining.
If human oversight under AI Act Article 14 constitutes meaningful involvement, GDPR Article 22 restrictions on solely automated decision-making may not be triggered.
GDPR Article 35 DPIA and AI Act Article 27 FRIA are overlapping obligations. Data protection risks inform fundamental rights analysis and vice versa.
Three approaches exist: full retraining, SISA sharded training for efficient partial retraining, and approximate unlearning methods which remain supplementary controls for high-risk systems.
The right to object (Article 21) allows data subjects to object to processing based on legitimate interest. The organisation must cease processing unless it demonstrates compelling legitimate grounds. An upheld objection requires data removal from the training set.