Why does the EU AI Act allow processing of special category data?

Article 10(5) permits processing special category data strictly for bias monitoring because meaningful fairness detection requires demographic data that GDPR otherwise restricts.

How should synthetic data be evaluated for bias detection?

Evaluate whether synthetic datasets capture correlational structure between protected characteristics and features by comparing bias metrics against a governed sample of real data.

What technical safeguards are required for special category data?

Five layers: data isolation, pseudonymisation, AES-256 encryption, role-based access control with audit trails, and confidential computing enclaves where feasible.

What governance workflow controls special category data access?

A three-stage approval chain involving the Technical SME, DPO Liaison, and AI Governance Lead ensures no single role can authorise access unilaterally.

Can we skip the synthetic data evaluation if we know it will not work?

No. The sufficiency test requires documented technical assessment of both synthetic and anonymised alternatives, regardless of the expected outcome. The evaluation must record the methods tried, results obtained, and reasoned conclusions.

How long can special category data be retained after bias detection?

Only for the duration specified in the processing request. After the bias detection purpose is complete, the data must be deleted or anonymised according to the documented schedule, with verification by the DPO Liaison.

Do we need confidential computing for all special category data processing?

Confidential computing provides the highest assurance but is not always mandatory. Where it is not feasible, a sandboxed analytics environment that prevents data exfiltration provides a practical alternative. The choice must be documented and justified.

Who can access special category data during bias detection?

Only named individuals with a documented business need, approved through the three-stage governance workflow. All access is logged in an immutable audit trail capturing identity, timestamp, data accessed, and purpose.

Can we skip the synthetic data evaluation if we know it will not work?

No. The sufficiency test requires documented technical assessment of both synthetic and anonymised alternatives, regardless of the expected outcome. The evaluation must record the methods tried, results obtained, and reasoned conclusions.

How long can special category data be retained after bias detection?

Only for the duration specified in the processing request. After the bias detection purpose is complete, the data must be deleted or anonymised according to the documented schedule, with verification by the DPO Liaison.

Do we need confidential computing for all special category data processing?

Confidential computing provides the highest assurance but is not always mandatory. Where it is not feasible, a sandboxed analytics environment that prevents data exfiltration provides a practical alternative. The choice must be documented and justified.

Who can access special category data during bias detection?

Only named individuals with a documented business need, approved through the three-stage governance workflow. All access is logged in an immutable audit trail capturing identity, timestamp, data accessed, and purpose.

Special Category Data for Bias Detection Under Article 10(5)

Written by

Michael Clark

Chief Executive Officer, Standard Intelligence

Founder and CEO of Standard Intelligence. Author of the Practitioners Implementation Guide series for EU AI Act compliance.

Martin Dean

Chief Technology Officer, Standard Intelligence

CTO of Standard Intelligence. Leads platform engineering and contributes to the PIG series technical content.

Article 10(5) of the EU AI Act permits processing special category personal data for bias monitoring and detection. This provision resolves a fundamental tension: meaningful fairness analysis requires demographic data that GDPR restricts, and Article 10(5) creates the controlled pathway for accessing it.

Abstract

Read abstract

Article 10(5) of the EU AI Act permits processing special category personal data, including race, ethnicity, health, and sexual orientation, strictly for bias monitoring and detection in high-risk AI systems. This provision resolves the tension between AI fairness obligations under Article 10 and data protection restrictions under GDPR Article 9. Before accessing real special category data, organisations must pass a sufficiency test demonstrating that synthetic and anonymised alternatives cannot reasonably support the required bias detection. Synthetic data evaluation examines whether generated datasets capture the correlational structure between protected characteristics and other features, while anonymisation evaluation assesses whether techniques such as k-anonymity and l-diversity preserve the subgroup structure needed for disaggregated fairness analysis. When alternatives prove insufficient, processing requires five layers of technical safeguards: data isolation in a dedicated environment, pseudonymisation of direct identifiers, AES-256 encryption at rest and TLS 1.3 in transit, role-based access control with immutable audit trails, and where feasible, confidential computing enclaves. A formal governance workflow involving the Technical SME, DPO Liaison, and AI Governance Lead controls access through a three-stage approval chain. Module 4 of the AI System Data Protocol must record the complete lifecycle from alternatives assessment through processing to verified deletion or anonymisation.

Why does Article 10(5) permit processing of special category data?

Regulatory Requirement

Article 10(5) permits the processing of special category personal data strictly for bias monitoring and detection, resolving a tension that otherwise makes meaningful fairness work impossible.

Article 10(5) permits the processing of special category personal data strictly for bias monitoring and detection, resolving a tension that otherwise makes meaningful fairness work impossible. Organisations cannot detect demographic bias without knowing the demographics of the people affected, yet gdpr restricts the processing of exactly that information. The provision covers race, ethnicity, health, sexual orientation, religious belief, trade union membership, genetic data, and biometric data.

This permission is one of the most consequential provisions in the EU AI Act for organisations serious about fairness. It creates a controlled pathway for accessing the demographic data needed to measure whether an AI system treats different groups equitably. Without it, providers of high risk ai system systems would face an unresolvable conflict between their fairness obligations under Article 10 and their data protection obligations under GDPR Article 9.

The provision is subject to specific conditions that constrain how this data may be used. Processing is permitted only for the purpose of bias monitoring and detection, not for any other operational use, and must comply with the technical and organisational safeguards detailed in subsequent sections. These conditions ensure that the exception remains narrow and proportionate to its stated purpose.

Organisations must satisfy a gateway condition before accessing real special category data: they must first demonstrate that synthetic and anonymised alternatives are insufficient for the bias detection task. This structured approach ensures that real sensitive data is a last resort, not a default starting point. The entire process, from alternatives assessment through processing to verified deletion, must be documented as Module 4 evidence within the AI System Data Protocol.

What is the sufficiency test for synthetic and anonymised alternatives?

Regulatory Requirement

Before processing real special category data, the organisation must demonstrate that bias detection cannot reasonably be carried out using synthetic or anonymised data.

Before processing real special category data, the organisation must demonstrate that bias detection cannot reasonably be carried out using synthetic or anonymised data. This is not a formality; it requires a documented technical assessment that actually attempts both alternatives, evaluates their adequacy, and records the results. The conclusion and its supporting evidence become a Module 4 artefact within the Data Governance documentation framework.

The sufficiency test establishes a gateway condition: real special category data may only be processed when the alternatives have been tried and found wanting. Organisations that skip this step, or treat it as a tick-box exercise, risk both regulatory challenge and erosion of trust with data subjects whose sensitive information is at stake.

The assessment must be structured and documented, covering both synthetic data generation and anonymisation techniques as separate evaluation tracks. Each track requires its own methodology, test results, and reasoned conclusion. Where both alternatives prove insufficient, the combined evidence package authorises progression to real data processing under the Article 10(5) safeguards. The documented assessment is not a one-time exercise; it should be revisited when the bias detection methodology changes, new synthetic data generation techniques become available, or the deployment population shifts significantly.

How should organisations evaluate synthetic data for bias detection?

Engineering Approach

Synthetic data evaluation requires generating datasets that replicate the protected characteristic distributions of the deployment population, running the full bias detection suite, and assessing whether the results reliably indicate real-world fairness behaviour. Tools such as SDV, Gretel.ai, or MOSTLY AI can generate synthetic datasets, but the critical question is whether they capture the correlational structure between protected characteristics and other features.

Synthetic data frequently falls short because it fails to reproduce conditional distributions that drive real-world disparate impact. A synthetic dataset may replicate marginal distributions correctly, showing the right proportion of each demographic group, without capturing the relationships between group membership and features such as educational attainment, postcode, or employment history. These conditional relationships are particularly difficult to synthesise faithfully.

The evaluation should quantify this gap by comparing bias metrics computed on synthetic data against metrics from a small, carefully governed sample of real data. If the metrics diverge significantly, synthetic data is insufficient. The documentation must record the specific distributional fidelity tests performed, their results, and the conclusion about synthetic data adequacy.

The assessment must go beyond aggregate accuracy metrics. A synthetic dataset that achieves high overall fidelity scores may still misrepresent the intersectional subgroups that are most vulnerable to bias. For example, the interaction between ethnicity and socioeconomic indicators may be poorly represented even when each variable's marginal distribution is accurate. The evaluation report should explicitly address intersectional fidelity, not just single-attribute distributions.

How is anonymised data assessed as an alternative?

Engineering Approach

Anonymised data preserves group-level demographic information while removing individual identifiers, but its effectiveness depends on the anonymisation technique, dataset size, and re-identification risk. The key question is whether the technique preserves the subgroup structure needed for disaggregated fairness analysis. Formal privacy models such as k-anonymity, l-diversity, and t-closeness provide frameworks for this assessment; the choice depends on dataset size and data sensitivity.

The assessment should consider three factors: whether the anonymisation technique maintains the subgroup structure required for meaningful fairness metrics, whether the anonymised dataset is large enough for statistically significant subgroup-level analysis, and whether the re-identification risk is acceptable given the sensitivity of the data. The DPO Liaison documents the evaluation with the anonymisation technique, the privacy parameters, the re-identification risk assessment, and the conclusion about suitability.

Where the assessment concludes that both synthetic and anonymised alternatives are insufficient, this documented conclusion authorises the organisation to proceed with real special category data under Article 10(5), subject to the safeguards described in subsequent sections. The DPO Liaison should document the evaluation with the specific anonymisation technique employed, the privacy parameters applied, the re-identification risk assessment methodology, and the final conclusion about suitability for bias detection purposes.

What technical safeguards does Article 10(5) require?

Engineering Approach

Processing special category data under Article 10(5) requires appropriate technical and organisational safeguards, specifically including pseudonymisation and encryption.

Processing special category data under Article 10(5) requires appropriate technical and organisational safeguards, specifically including pseudonymisation and encryption. In practice, the implementation follows five distinct layers that together create a defence-in-depth architecture around the sensitive data.

The first layer is isolation: special category data is stored in a dedicated, physically or logically separated environment that does not sit alongside main development or production data. Access from the main environment to the special category environment is blocked, and the bias computation runs entirely within the isolated environment.

The second layer is pseudonymisation, where direct identifiers such as names, addresses, and national ID numbers are replaced with pseudonymous keys. The mapping table connecting pseudonyms to identities is stored separately under stricter access controls, in a system such as HashiCorp Vault. Only the DPO Liaison and a named data governance officer have access to the mapping table.

The third layer is encryption: AES-256 at rest and TLS 1.3 in transit, ensuring no unencrypted special category data exists at any point in the pipeline. The fourth layer is access control and audit, with role-based access restricted to named individuals who have a documented business need. Every access event is logged in an immutable audit trail capturing the accessor's identity, timestamp, data accessed, and purpose. Cloud-native audit logging services must be specifically configured to capture the special category data environment and retained for the full AISDP evidence period.

What governance workflow controls access to special category data?

Compensating Controls

A formal governance workflow must precede any processing of special category data, involving three roles in sequence.

A formal governance workflow must precede any processing of special category data, involving three roles in sequence. First, the Technical SME prepares a Special Category Data Processing Request specifying the bias detection purpose, the specific data elements required, the processing methodology, and the expected retention period. The DPO Liaison then reviews the request for GDPR Article 9 compliance, confirming that the Article 10(5) legal basis applies and that the Data Protection Impact Assessment covers this processing.

Finally, the AI Governance Lead approves the request, confirming that only the minimum data necessary for the bias detection purpose will be processed. Once approved, processing executes within the secured environment described above. Results are extracted in aggregate form only; individual records must not leave the secure environment under any circumstances.

This three-stage approval chain ensures that no single role can authorise access to special category data unilaterally. The separation of technical, legal, and governance responsibilities creates accountability at each stage and produces a clear audit trail of the decision-making process. Each approval step generates documentation that becomes part of the Module 4 evidence package.

Following completion of the bias detection purpose, the special category data is deleted or anonymised according to the documented schedule. The deletion or anonymisation must be verified and attested by the DPO Liaison, creating a closed loop from initial request through processing to confirmed disposal. This lifecycle approach ensures that special category data does not persist beyond its authorised purpose.

How must deletion and anonymisation be verified?

Engineering Approach

After completing the bias detection purpose, the DPO Liaison deletes or anonymises the special category data according to the documented schedule.

After completing the bias detection purpose, the DPO Liaison deletes or anonymises the special category data according to the documented schedule. Deletion requires technical verification that goes beyond a simple delete command, confirming removal from all storage locations including backups, caches, and derived datasets. For anonymised data, a re-identification risk assessment confirms that individuals cannot reasonably be re-linked to their records.

The verification results are signed by the DPO Liaison and retained as Module 4 evidence within the Conformity Assessment documentation. This attestation demonstrates that the organisation's use of special category data was time-bounded and purpose-limited, providing concrete evidence of compliance with the proportionality principle embedded in Article 10(5).

For deletion, the verification must confirm removal from all storage locations, going beyond the primary data store to include backups, caches, and any derived datasets that may contain fragments of the special category data. For anonymisation, the re-identification risk assessment must confirm that individuals cannot reasonably be re-linked to their records given the current state of available re-identification techniques.

What must the AISDP record about special category data processing?

Regulatory Requirement

Module 4 of the AI System Data Protocol must record a comprehensive set of information about any special category data processing.

Module 4 of the AI System Data Protocol must record a comprehensive set of information about any special category data processing. The minimum documentation requirements include whether special category data was processed, the specific categories of data involved, the bias detection purpose served, the synthetic and anonymised alternative assessment with its conclusion, and the safeguards applied with sufficient detail for a reviewer to assess their adequacy.

Additional required records cover the governance approval chain showing each authorisation step, the processing dates and scope, the deletion or anonymisation schedule and verification results, and the DPO Liaison's attestation of compliance with both GDPR Article 9 and AI Act Article 10(5). This documentation package serves dual purposes: it satisfies the Technical Documentation requirements under the AI Act and provides evidence for GDPR accountability obligations.

The level of detail must be sufficient for an independent reviewer to assess whether the safeguards were adequate and proportionate. Vague descriptions such as "appropriate security measures were applied" are insufficient; the documentation must specify the exact encryption standards, access control mechanisms, and audit logging configurations deployed.

The DPO Liaison's attestation of compliance with both GDPR Article 9 and AI Act Article 10(5) forms the capstone of the Module 4 evidence package. This dual attestation confirms that the organisation has satisfied the requirements of both regulatory frameworks simultaneously, addressing the intersection of data protection and AI governance that Article 10(5) was designed to navigate. Organisations should treat this documentation as a living record, updating it whenever the bias detection methodology, safeguards, or data retention practices change. Maintaining this documentation to a high standard provides the strongest available evidence that the organisation has navigated the intersection of data protection and AI fairness requirements responsibly.

Special Category Data for Bias Detection Under Article 10(5)

Written by

Why does Article 10(5) permit processing of special category data?

What is the sufficiency test for synthetic and anonymised alternatives?

How should organisations evaluate synthetic data for bias detection?

How is anonymised data assessed as an alternative?

What technical safeguards does Article 10(5) require?

What governance workflow controls access to special category data?

How must deletion and anonymisation be verified?

What must the AISDP record about special category data processing?

Frequently Asked Questions

Related Pages

In This Section

Build compliance into your pipeline