We use cookies to improve your experience and analyse site traffic.
Article 10(5) of the EU AI Act permits processing special category personal data for bias monitoring and detection. This provision resolves a fundamental tension: meaningful fairness analysis requires demographic data that GDPR restricts, and Article 10(5) creates the controlled pathway for accessing it.
Article 10(5) permits the processing of special category personal data strictly for bias monitoring and detection, resolving a tension that otherwise makes meaningful fairness work impossible.
Article 10(5) permits the processing of special category personal data strictly for bias monitoring and detection, resolving a tension that otherwise makes meaningful fairness work impossible. Organisations cannot detect demographic bias without knowing the demographics of the people affected, yet gdpr restricts the processing of exactly that information. The provision covers race, ethnicity, health, sexual orientation, religious belief, trade union membership, genetic data, and biometric data.
This permission is one of the most consequential provisions in the EU AI Act for organisations serious about fairness. It creates a controlled pathway for accessing the demographic data needed to measure whether an AI system treats different groups equitably. Without it, providers of high risk ai system systems would face an unresolvable conflict between their fairness obligations under Article 10 and their data protection obligations under GDPR Article 9.
The provision is subject to specific conditions that constrain how this data may be used. Processing is permitted only for the purpose of bias monitoring and detection, not for any other operational use, and must comply with the technical and organisational safeguards detailed in subsequent sections. These conditions ensure that the exception remains narrow and proportionate to its stated purpose.
Organisations must satisfy a gateway condition before accessing real special category data: they must first demonstrate that synthetic and anonymised alternatives are insufficient for the bias detection task. This structured approach ensures that real sensitive data is a last resort, not a default starting point. The entire process, from alternatives assessment through processing to verified deletion, must be documented as Module 4 evidence within the AI System Data Protocol.
Before processing real special category data, the organisation must demonstrate that bias detection cannot reasonably be carried out using synthetic or anonymised data.
Before processing real special category data, the organisation must demonstrate that bias detection cannot reasonably be carried out using synthetic or anonymised data. This is not a formality; it requires a documented technical assessment that actually attempts both alternatives, evaluates their adequacy, and records the results. The conclusion and its supporting evidence become a Module 4 artefact within the Data Governance documentation framework.
The sufficiency test establishes a gateway condition: real special category data may only be processed when the alternatives have been tried and found wanting. Organisations that skip this step, or treat it as a tick-box exercise, risk both regulatory challenge and erosion of trust with data subjects whose sensitive information is at stake.
The assessment must be structured and documented, covering both synthetic data generation and anonymisation techniques as separate evaluation tracks. Each track requires its own methodology, test results, and reasoned conclusion. Where both alternatives prove insufficient, the combined evidence package authorises progression to real data processing under the Article 10(5) safeguards. The documented assessment is not a one-time exercise; it should be revisited when the bias detection methodology changes, new synthetic data generation techniques become available, or the deployment population shifts significantly.
Synthetic data evaluation requires generating datasets that replicate the protected characteristic distributions of the deployment population, running the full bias detection suite, and assessing whether the results reliably indicate real-world fairness behaviour.
Synthetic data evaluation requires generating datasets that replicate the protected characteristic distributions of the deployment population, running the full bias detection suite, and assessing whether the results reliably indicate real-world fairness behaviour. Tools such as SDV, Gretel.ai, or MOSTLY AI can generate synthetic datasets, but the critical question is whether they capture the correlational structure between protected characteristics and other features.
Synthetic data frequently falls short because it fails to reproduce conditional distributions that drive real-world disparate impact. A synthetic dataset may replicate marginal distributions correctly, showing the right proportion of each demographic group, without capturing the relationships between group membership and features such as educational attainment, postcode, or employment history. These conditional relationships are particularly difficult to synthesise faithfully.
The evaluation should quantify this gap by comparing bias metrics computed on synthetic data against metrics from a small, carefully governed sample of real data. If the metrics diverge significantly, synthetic data is insufficient. The documentation must record the specific distributional fidelity tests performed, their results, and the conclusion about synthetic data adequacy.
The assessment must go beyond aggregate accuracy metrics. A synthetic dataset that achieves high overall fidelity scores may still misrepresent the intersectional subgroups that are most vulnerable to bias. For example, the interaction between ethnicity and socioeconomic indicators may be poorly represented even when each variable's marginal distribution is accurate. The evaluation report should explicitly address intersectional fidelity, not just single-attribute distributions.
Anonymised data preserves group-level demographic information while removing individual identifiers, but its effectiveness depends on the anonymisation technique, dataset size, and re-identification risk.
Anonymised data preserves group-level demographic information while removing individual identifiers, but its effectiveness depends on the anonymisation technique, dataset size, and re-identification risk. The key question is whether the technique preserves the subgroup structure needed for disaggregated fairness analysis. Formal privacy models such as k-anonymity, l-diversity, and t-closeness provide frameworks for this assessment; the choice depends on dataset size and data sensitivity.
The assessment should consider three factors: whether the anonymisation technique maintains the subgroup structure required for meaningful fairness metrics, whether the anonymised dataset is large enough for statistically significant subgroup-level analysis, and whether the re-identification risk is acceptable given the sensitivity of the data. The DPO Liaison documents the evaluation with the anonymisation technique, the privacy parameters, the re-identification risk assessment, and the conclusion about suitability.
Where the assessment concludes that both synthetic and anonymised alternatives are insufficient, this documented conclusion authorises the organisation to proceed with real special category data under Article 10(5), subject to the safeguards described in subsequent sections. The DPO Liaison should document the evaluation with the specific anonymisation technique employed, the privacy parameters applied, the re-identification risk assessment methodology, and the final conclusion about suitability for bias detection purposes.
Processing special category data under Article 10(5) requires appropriate technical and organisational safeguards, specifically including pseudonymisation and encryption.
Processing special category data under Article 10(5) requires appropriate technical and organisational safeguards, specifically including pseudonymisation and encryption. In practice, the implementation follows five distinct layers that together create a defence-in-depth architecture around the sensitive data.
The first layer is isolation: special category data is stored in a dedicated, physically or logically separated environment that does not sit alongside main development or production data. Access from the main environment to the special category environment is blocked, and the bias computation runs entirely within the isolated environment.
The second layer is pseudonymisation, where direct identifiers such as names, addresses, and national ID numbers are replaced with pseudonymous keys. The mapping table connecting pseudonyms to identities is stored separately under stricter access controls, in a system such as HashiCorp Vault. Only the DPO Liaison and a named data governance officer have access to the mapping table.
The third layer is encryption: AES-256 at rest and TLS 1.3 in transit, ensuring no unencrypted special category data exists at any point in the pipeline. The fourth layer is access control and audit, with role-based access restricted to named individuals who have a documented business need. Every access event is logged in an immutable audit trail capturing the accessor's identity, timestamp, data accessed, and purpose. Cloud-native audit logging services must be specifically configured to capture the special category data environment and retained for the full AISDP evidence period.
A formal governance workflow must precede any processing of special category data, involving three roles in sequence.
A formal governance workflow must precede any processing of special category data, involving three roles in sequence. First, the Technical SME prepares a Special Category Data Processing Request specifying the bias detection purpose, the specific data elements required, the processing methodology, and the expected retention period. The DPO Liaison then reviews the request for GDPR Article 9 compliance, confirming that the Article 10(5) legal basis applies and that the Data Protection Impact Assessment covers this processing.
Finally, the AI Governance Lead approves the request, confirming that only the minimum data necessary for the bias detection purpose will be processed. Once approved, processing executes within the secured environment described above. Results are extracted in aggregate form only; individual records must not leave the secure environment under any circumstances.
This three-stage approval chain ensures that no single role can authorise access to special category data unilaterally. The separation of technical, legal, and governance responsibilities creates accountability at each stage and produces a clear audit trail of the decision-making process. Each approval step generates documentation that becomes part of the Module 4 evidence package.
Following completion of the bias detection purpose, the special category data is deleted or anonymised according to the documented schedule. The deletion or anonymisation must be verified and attested by the DPO Liaison, creating a closed loop from initial request through processing to confirmed disposal. This lifecycle approach ensures that special category data does not persist beyond its authorised purpose.
After completing the bias detection purpose, the DPO Liaison deletes or anonymises the special category data according to the documented schedule.
After completing the bias detection purpose, the DPO Liaison deletes or anonymises the special category data according to the documented schedule. Deletion requires technical verification that goes beyond a simple delete command, confirming removal from all storage locations including backups, caches, and derived datasets. For anonymised data, a re-identification risk assessment confirms that individuals cannot reasonably be re-linked to their records.
The verification results are signed by the DPO Liaison and retained as Module 4 evidence within the Conformity Assessment documentation. This attestation demonstrates that the organisation's use of special category data was time-bounded and purpose-limited, providing concrete evidence of compliance with the proportionality principle embedded in Article 10(5).
For deletion, the verification must confirm removal from all storage locations, going beyond the primary data store to include backups, caches, and any derived datasets that may contain fragments of the special category data. For anonymisation, the re-identification risk assessment must confirm that individuals cannot reasonably be re-linked to their records given the current state of available re-identification techniques.
Module 4 of the AI System Data Protocol must record a comprehensive set of information about any special category data processing.
Module 4 of the AI System Data Protocol must record a comprehensive set of information about any special category data processing. The minimum documentation requirements include whether special category data was processed, the specific categories of data involved, the bias detection purpose served, the synthetic and anonymised alternative assessment with its conclusion, and the safeguards applied with sufficient detail for a reviewer to assess their adequacy.
Additional required records cover the governance approval chain showing each authorisation step, the processing dates and scope, the deletion or anonymisation schedule and verification results, and the DPO Liaison's attestation of compliance with both GDPR Article 9 and AI Act Article 10(5). This documentation package serves dual purposes: it satisfies the Technical Documentation requirements under the AI Act and provides evidence for GDPR accountability obligations.
The level of detail must be sufficient for an independent reviewer to assess whether the safeguards were adequate and proportionate. Vague descriptions such as "appropriate security measures were applied" are insufficient; the documentation must specify the exact encryption standards, access control mechanisms, and audit logging configurations deployed.
The DPO Liaison's attestation of compliance with both GDPR Article 9 and AI Act Article 10(5) forms the capstone of the Module 4 evidence package. This dual attestation confirms that the organisation has satisfied the requirements of both regulatory frameworks simultaneously, addressing the intersection of data protection and AI governance that Article 10(5) was designed to navigate. Organisations should treat this documentation as a living record, updating it whenever the bias detection methodology, safeguards, or data retention practices change. Maintaining this documentation to a high standard provides the strongest available evidence that the organisation has navigated the intersection of data protection and AI fairness requirements responsibly.
No. The sufficiency test requires documented technical assessment of both synthetic and anonymised alternatives, regardless of the expected outcome. The evaluation must record the methods tried, results obtained, and reasoned conclusions.
Only for the duration specified in the processing request. After the bias detection purpose is complete, the data must be deleted or anonymised according to the documented schedule, with verification by the DPO Liaison.
Confidential computing provides the highest assurance but is not always mandatory. Where it is not feasible, a sandboxed analytics environment that prevents data exfiltration provides a practical alternative. The choice must be documented and justified.
Only named individuals with a documented business need, approved through the three-stage governance workflow. All access is logged in an immutable audit trail capturing identity, timestamp, data accessed, and purpose.
Evaluate whether synthetic datasets capture correlational structure between protected characteristics and features by comparing bias metrics against a governed sample of real data.
Five layers: data isolation, pseudonymisation, AES-256 encryption, role-based access control with audit trails, and confidential computing enclaves where feasible.
A three-stage approval chain involving the Technical SME, DPO Liaison, and AI Governance Lead ensures no single role can authorise access unilaterally.
Module 4 must record whether data was processed, categories involved, alternatives assessment, safeguards applied, governance approvals, processing dates, and deletion verification.
The fifth layer, confidential computing, provides the highest assurance. The bias computation runs within a hardware-secured enclave such as Azure SGX, AWS Nitro, or Google Confidential VMs, preventing data exfiltration even by system administrators. This carries operational complexity and performance overhead but provides a technical guarantee that data cannot leave the computation environment.
Where confidential computing is not feasible, a sandboxed analytics environment that restricts network access and prevents data export provides a practical alternative. The bias detection computations should be designed to produce only aggregate outputs; no individual-level records should leave the processing environment regardless of the isolation mechanism employed. The choice of isolation technology should be documented as part of the safeguards record, with justification for the level of assurance selected.