We use cookies to improve your experience and analyse site traffic.
Article 10 of the EU AI Act imposes detailed data governance requirements on training, validation, and testing datasets for high-risk AI systems. These requirements cover dataset documentation, completeness assessment, fairness and bias evaluation, data lineage, special category data processing under Article 10(5), GDPR alignment, third-party data governance, and embedding model and knowledge base governance. This page translates each obligation into concrete engineering practices documented in AISDP Module 4.
Article 10 of the EU AI Act establishes data governance requirements for training, validation, and testing datasets that are among the most prescriptive in the regulation.
Article 10 of the EU AI Act establishes data governance requirements for training, validation, and testing datasets that are among the most prescriptive in the regulation.
These requirements feed directly into aisdp Module 4, covering dataset documentation, completeness assessment, fairness and bias evaluation, data lineage, special category data processing, GDPR alignment, and third-party data governance.
Data governance under Article 10 is an engineering discipline embedded in every stage of the data lifecycle, from collection through preparation, labelling, training, validation, and ongoing monitoring. The AISDP must demonstrate that governance was designed into the data pipeline from the outset, not documented retrospectively. Training, validation, and testing datasets must satisfy governance practices covering relevance, representativeness, freedom from errors, and completeness, with particular attention to the persons or groups on whom the system is intended to operate. The Technical SME ensures appropriate statistical properties across all datasets.
The scope extends beyond conventional training data. Knowledge bases in retrieval-augmented generation (RAG) architectures, embedding models that encode semantic relationships, and third-party data sources all fall within the governance perimeter. An organisation that governs its primary training data meticulously but neglects the knowledge base feeding its RAG pipeline has a compliance gap that could surface during conformity assessment. Risk assessment establishes the risk profile that determines the depth of data governance required, while model selection decisions constrain the types of data the system will consume. The outputs from data governance feed primarily into AISDP Module 4 (Data Governance and Dataset Documentation) but are cross-referenced by Module 5 (Testing and Validation) and Module 6 (Monitoring).
Every dataset used in the system lifecycle requires structured documentation covering provenance, composition, preparation, quality, annotation, and known limitations.
Every dataset used in the system lifecycle requires structured documentation covering provenance, composition, preparation, quality, annotation, and known limitations. Generic data catalogue entries do not satisfy Article 10; documentation must be specific enough for an assessor to evaluate suitability for training the system in question.
Provenance requires specificity: "data collected from deployer ATS systems between January 2021 and December 2023 under data processing agreements" is acceptable; "data from various sources" is not. The record must state the collection methodology, whether informed consent was obtained or another legal basis under GDPR Article 6 applies, and the licensing terms for any third-party data including whether those terms permit the intended use.
Composition covers dataset size (record count, feature count, storage size), temporal coverage, and geographic and demographic distribution. Statistics must be presented both in aggregate and disaggregated by relevant subgroups, particularly where protected characteristics are represented and in what proportions relative to the deployment population.
Preparation documents every preprocessing, cleaning, transformation, augmentation, and feature engineering step. Records removed must be logged with the reason and count. Missing value handling and imputation methods must be recorded with the assumptions they encode. Quality captures the metrics applied, error rates observed, how errors were detected and corrected, and the automated quality checks enforced. records annotator qualifications, guidelines, inter-annotator agreement rates, disagreement resolution processes, and whether annotators were compensated fairly under conditions that support quality, since annotation quality directly affects label accuracy which directly affects model fairness. identifies gaps, biases, underrepresented subgroups, and temporal or geographic skew.
Article 10(3) requires datasets to be "relevant, sufficiently representative, and to the best extent possible, free of errors and complete.
Article 10(3) requires datasets to be "relevant, sufficiently representative, and to the best extent possible, free of errors and complete." Completeness has three dimensions that the AISDP must address: feature completeness, population completeness, and temporal completeness.
Feature completeness means every feature the model's intended purpose logically requires should be present and populated. Missing features force the model to rely on proxy variables, which may introduce bias. The AISDP must document which features are available, which are missing and why, and what compensating controls are in place.
Population completeness requires the dataset to represent the full range of persons and groups on whom the system will operate. If deployed across the EU/EEA, training data should reflect the demographic diversity of the deployment population. Underrepresentation of specific subgroups degrades the model's performance for those subgroups and creates fairness risk. Temporal completeness requires sufficient coverage of seasonal, cyclical, and trend variations. A model trained on one year of data may not capture multi-year patterns. Module 4 records the temporal coverage and the assessment of whether it is sufficient for the system's intended purpose.
Complete data is rarely achievable in practice. Module 4 must record the compensating controls applied when completeness gaps are identified. Synthetic data augmentation can address underrepresentation of specific subgroups, though the AISDP must document the generation algorithm, validation against real data distributions, the proportion of synthetic data in the final training set, and the trade-off between coverage and the risk that synthetic data fails to capture real-world complexity. Transfer learning from related domains can compensate for limited data in the target domain, provided the domain relevance is justified and performance degradation from domain shift is measured and documented. Stratified sampling ensures small subgroups appear in validation and test sets in sufficient numbers for meaningful performance metrics. Ensemble methods combining predictions from multiple models trained on overlapping but non-identical subsets can improve robustness to completeness gaps, with the ensemble composition and combination logic documented in the AISDP.
Data quality validation is the automated gate that prevents corrupted or drifting data from reaching the model.
Data quality validation is the automated gate that prevents corrupted or drifting data from reaching the model. Without it, a single upstream change, such as a source system schema modification, a data provider's silent methodology change, or a pipeline bug introducing null values, can propagate through to the model and degrade performance or introduce bias.
Validation should run at three pipeline checkpoints: at ingestion (before raw data enters the system), after each transformation step (confirming the transformation produced expected output), and before training (confirming the final dataset meets all quality standards). Each checkpoint enforces a different set of expectations.
Schema validation catches structural problems: renamed columns, changed data types, and unexpected formats. Pandera provides lightweight schema validation for Pandas DataFrames using a decorator pattern. For SQL pipelines, dbt's built-in tests (unique, not_null, accepted_values, relationships) provide schema-level assertions. Statistical validation catches distributional problems: sudden feature distribution shifts, correlation structure changes, or anomalous batches. The Kolmogorov-Smirnov test and chi-squared test are the workhorses, with reference distributions captured from a validated baseline dataset and updated periodically (for example, quarterly, aligned with the risk register review). Evidently AI generates data quality reports that include distribution comparisons, correlation analysis, and drift metrics suitable for both pipeline gating and periodic review. catches individual-record problems: extreme values, multi-feature statistical outliers, duplicates, or corrupted entries. Isolation forests and z-score methods are standard approaches.
Bias detection is the most technically demanding aspect of data governance under Article 10.
Bias detection is the most technically demanding aspect of data governance under Article 10. Article 10(2)(f) requires providers to examine training data for "possible biases that are likely to affect the health and safety of persons, have a negative impact on fundamental rights, or lead to discrimination." The assessment operates in two phases: before training and after training.
Pre-training bias assessment examines the data before any model is trained. Distributional analysis computes the distribution of each feature across protected characteristic subgroups, using chi-squared tests for categorical features and Kolmogorov-Smirnov or Mann-Whitney tests for continuous features. The practical output is a matrix with features on one axis, protected characteristics on the other, and each cell showing the test statistic and p-value. Features with statistically significant distributional differences (p less than 0.05 after Bonferroni correction for multiple comparisons) are flagged for investigation. ydata-profiling automates this for tabular datasets, producing an HTML report with correlation matrices and distribution comparisons.
Label bias analysis examines whether outcome labels themselves reflect historical discrimination. In a recruitment context, the "hired/not hired" label encodes human recruiter decisions that may carry conscious or unconscious bias. Detecting label bias requires stepping outside the data: inter-rater reliability analysis (having multiple independent labellers rate the same instances, measuring agreement via Cohen's kappa or Krippendorff's alpha) reveals the extent to which labels are subjective. Re-labelling by diverse panels provides a corrective dataset for comparison. Where relabelling is infeasible, the AISDP documents the known label bias, its potential effect on the model, and the compensating controls.
No bias mitigation technique eliminates bias without side effects.
No bias mitigation technique eliminates bias without side effects. Every technique trades one form of accuracy or fairness for another, and the AISDP must document what technique was used, why it was chosen over alternatives, what trade-off it introduced, and whether that trade-off is acceptable.
Pre-processing mitigations modify the training data before the model sees it. Resampling is the simplest approach: oversampling underrepresented subgroups or undersampling overrepresented subgroups. SMOTE (Synthetic Minority Over-sampling Technique) generates synthetic examples by interpolating between existing minority records, reducing overfitting risk compared to simple duplication. ADASYN focuses synthetic generation on boundary regions where the classifier struggles. Reweighting assigns higher training weights to underrepresented subgroups, with weight for each instance inversely proportional to its subgroup's prevalence. AI Fairness 360 provides a reweighting preprocessor that computes optimal weights automatically. The disparate impact remover (Feldman et al., 2015) modifies feature values to reduce correlations with protected characteristics while preserving predictive value. Learning fair representations (Zemel et al., 2013) learns a new feature space explicitly uninformative about protected characteristics.
In-processing mitigations modify the training procedure itself. Fairlearn's ExponentiatedGradient solves a constrained optimisation problem, maximising accuracy subject to a fairness constraint such as demographic parity or equalised odds. It trains many candidate models with different constraint levels and returns the best balance of accuracy and fairness, integrating with scikit-learn estimators. Adversarial debiasing (Zhang et al., 2018) trains an adversary network that tries to predict protected characteristics from the model's internal representations, penalising the model for leaking demographic information. The adversary's learning rate relative to the main model critically affects the trade-off.
Data lineage, the ability to trace every data element from source collection through each transformation to final use in the model, is foundational to AISDP Module 4.
Data lineage, the ability to trace every data element from source collection through each transformation to final use in the model, is foundational to AISDP Module 4. Without it, the organisation cannot prove what data the model was trained on, how that data was prepared, or whether preparation steps introduced bias.
Lineage operates at three levels, and most organisations need all three. Pipeline-level lineage captures the macro view: which steps ran, in what order, with what inputs and outputs. DAG-based orchestration tools (Airflow, Prefect, Dagster) provide this automatically because the pipeline definition is itself a directed acyclic graph of steps with declared dependencies. Every pipeline execution must be logged with a unique execution ID, timestamp, input dataset versions, output dataset versions, and execution status.
Transformation-level lineage captures the logic within each step: what the cleaning step actually did, what the feature engineering computed, what the imputation strategy was. This requires each transform to be defined as version-controlled code, not ad hoc SQL queries or Jupyter notebook cells. dbt is the strongest tool for SQL-based transforms, with each model defined as a SQL file in a Git repository with tests, documentation, and automatic lineage graph output. For Python-based transforms, the code provides lineage when version-controlled, but parameters (thresholds, imputation values, normalisation statistics) must also be captured as structured metadata.
Column-level lineage is the finest grain and the most valuable for bias analysis. It tracks how each column in the output dataset relates to columns in source datasets. If the model uses a "risk_score" feature, column-level lineage reveals that it was derived from "annual_income" (source: payroll system) and "postcode" (source: address database), making the proxy relationship with ethnicity visible. OpenLineage provides an open standard for emitting lineage events at all three levels, with Marquez implementing it as a queryable lineage server. DataHub and Apache Atlas offer similar capabilities.
Article 10(5) permits processing of special category personal data strictly for bias monitoring and detection, subject to specific safeguards.
Article 10(5) permits processing of special category personal data strictly for bias monitoring and detection, subject to specific safeguards. This provision resolves a fundamental tension: meaningful bias detection is frequently impossible without access to the demographic data that data protection law restricts.
The gateway condition is the sufficiency test. Before processing real special category data, the organisation must demonstrate that bias detection "cannot reasonably be carried out" using synthetic or anonymised alternatives. This is not a formality; it requires a documented technical assessment. Synthetic data evaluation involves generating datasets replicating protected characteristic distributions (using tools like SDV, Gretel.ai, or MOSTLY AI), running the full bias detection suite, and assessing whether results reliably indicate real-world fairness behaviour. Synthetic data frequently falls short because it fails to capture the correlational structure between protected characteristics and features such as educational attainment, postcode, or employment history. The evaluation should quantify this by comparing bias metrics on synthetic data against metrics on a small, carefully governed sample of real data.
Anonymisation evaluation assesses whether anonymised data preserves the subgroup structure needed for disaggregated fairness analysis. k-anonymity, l-diversity, and t-closeness provide formal privacy models; the choice depends on dataset size and data sensitivity. The DPO Liaison documents the anonymisation technique, privacy parameters, re-identification risk assessment, and suitability conclusion.
The AI Act's data governance requirements operate alongside the GDPR as cumulative obligations.
The AI Act's data governance requirements operate alongside the GDPR as cumulative obligations. An organisation that satisfies Article 10 but violates the GDPR is non-compliant with both regulations, because the AI Act's data governance provisions presuppose GDPR compliance. Module 4 of the AISDP must address both frameworks in an integrated manner.
Lawful basis selection under GDPR Article 6 is one of the most consequential data governance decisions. Consent (Article 6(1)(a)) offers the strongest legal footing but is often impractical for large-scale datasets, since consent must be freely given, specific, informed, and unambiguous. Withdrawal rights create operational challenges: if a data subject withdraws consent, the organisation must remove their data from the training set and either retrain or demonstrate the data cannot be recovered from model parameters. Legitimate interest (Article 6(1)(f)) is more commonly relied upon, requiring a documented three-part balancing test (legitimate interest identified, processing necessity demonstrated, interests balanced against data subjects' rights) for each dataset and processing purpose. Public interest (Article 6(1)(e)) is available for public authority AI systems, while contract performance (Article 6(1)(b)) applies where the AI system processes personal data to fulfil a contract.
Data subject rights present specific technical challenges in the AI context. The right to erasure (GDPR Article 17) is the most demanding: the organisation must remove data from the training dataset and either retrain the model or demonstrate that the individual's data cannot be recovered from the model's parameters. Three approaches exist with different cost and assurance profiles. Full retraining removes the records and retrains from scratch, which is cleanest but expensive for large models. SISA training (Sharded, Isolated, Sliced, Aggregated) partitions training data into shards with separate models, so only the shard containing the data subject's records requires retraining. Approximate unlearning attempts to reverse the effect of specific records without full retraining but lacks formal guarantees and should be treated as supplementary for high-risk systems.
Many high-risk AI systems rely on data the organisation does not collect, curate, or control.
Many high-risk AI systems rely on data the organisation does not collect, curate, or control. Article 10's governance requirements apply regardless of data source, and the organisation bears full compliance responsibility for data it has limited ability to govern directly.
Embedding models and knowledge bases in RAG architectures similarly require governance proportionate to their influence on system outputs.
Third-party data governance operates across three layers. The contractual layer establishes the baseline across five domains: provenance disclosure (collection methodology, GDPR lawful basis, populations represented, known limitations, prior processing); quality specifications (measurable standards for completeness, accuracy, timeliness, and consistency); bias and representativeness warranties (demographic composition statistics where lawful); change notification (30 to 90 days notice before material methodology changes); and audit rights (risk-proportionate inspection of supplier governance practices, annual for high-sensitivity data, biennial for lower-risk sources).
The technical layer validates every delivery regardless of contractual promises. An automated intake validation pipeline checks schema compliance, completeness against contracted thresholds, statistical distribution against historical baselines, and anomaly detection for unusual records or batches. Great Expectations or Soda Core can define a dedicated expectation suite per supplier encoding contractual quality specifications as automated checks. Deliveries that fail are quarantined and do not enter the training pipeline. Periodic re-assessment, at least annually, evaluates whether data remains representative. The monitoring layer defends against silent changes through statistical comparison of each delivery's distributional profile against the historical baseline.
Whether a knowledge base constitutes 'training, validation and testing data' within Article 10 is an open legal question, since the knowledge base conditions outputs at inference time rather than training model parameters. The recommended compliance approach is to apply Article 10 governance to the knowledge base, adapted for inference-time retrieval. The cost of discovering post-deployment that an ungoverned knowledge base introduced bias or inaccuracy into a high-risk system materially exceeds the cost of applying governance from the outset.
Yes. Research demonstrates that original text can be partially or fully reconstructed from embeddings through inversion attacks. If the embedding model encodes documents containing personal data, the stored embeddings may constitute personal data under GDPR Article 4(1) because they relate to an identifiable natural person. The DPO Liaison must assess re-identification feasibility considering embedding dimensionality, available inversion techniques, and whether embeddings are stored alongside identifying metadata.
Fairness metrics can pull in opposite directions: a model achieving equalised odds may fail predictive parity, and a model with good calibration within groups may violate the four-fifths rule. The organisation must decide which fairness concept takes priority for its specific system, document the rationale, and declare the chosen metric as the primary threshold for the deployment blocking gate. Fairlearn's MetricFrame reports all metrics simultaneously so the trade-offs are visible.
Three approaches exist with different cost and assurance profiles. Full retraining removes the individual's records and retrains from scratch — cleanest but expensive for large models. SISA training (Sharded, Isolated, Sliced, Aggregated) partitions data into shards so only the affected shard needs retraining. Approximate unlearning attempts to reverse the effect of specific records without full retraining, but lacks formal guarantees and should be treated as supplementary only for high-risk systems.
Every delivery should pass through an automated intake validation pipeline checking schema compliance, completeness against contracted thresholds, statistical distribution against the historical baseline for that source, and anomaly detection for unusual records or batches. Deliveries that fail are quarantined and do not enter the pipeline until the failure is resolved. The quarantine log and validation results are retained as Module 4 evidence artefacts.
Provenance, composition with demographic disaggregation, preparation steps, quality metrics, annotation processes, and known limitations. Documentation depth must be proportionate to the dataset's role in the system.
Pre-training assessment covers distributional analysis, label bias, proxy variable detection, and intersectional analysis. Post-training evaluation uses five complementary metrics: selection rate ratio, equalised odds, predictive parity, calibration within groups, and counterfactual fairness testing.
Article 10(5) permits processing of special category personal data strictly for bias monitoring when synthetic and anonymised alternatives are insufficient, subject to five layers of safeguards: isolation, pseudonymisation, encryption, access control, and confidential computing.
They operate as cumulative obligations. Lawful basis selection, data subject rights (especially the right to erasure), DPIA/FRIA coordination, and data retention tensions between GDPR storage limitation and Article 18's ten-year documentation retention all require integrated treatment.
Article 10 requirements apply regardless of data source. Governance operates across contractual controls (provenance, quality, bias warranties, audit rights), technical validation (automated intake pipelines), and monitoring for silent supplier changes.
The Datasheets for Datasets framework (Gebru et al., 2021) provides the most thorough structure, organising documentation into seven sections: motivation, composition, collection process, preprocessing and cleaning, uses, distribution, and maintenance. For EU AI Act compliance, the composition section must include distributional analysis across protected characteristics feeding the bias assessment. The collection process section must document the GDPR lawful basis. The preprocessing section must align with data lineage requirements. The uses section must explicitly state limitations relevant to the system's intended purpose.
Documentation depth should be proportionate to the dataset's role. Training datasets for high-risk systems warrant comprehensive datasheets; static reference datasets warrant lighter treatment. A 50-page datasheet for a simple lookup table adds cost without compliance value. The AI System Assessor must document the standard applied to each dataset category and the rationale for the proportionality decision. Dataset documentation is a living artefact updated whenever the dataset changes, with version bumps triggering corresponding documentation updates. Data catalogue tools such as OpenMetadata and DataHub support attaching structured documentation to dataset versions with change tracking. For lighter tooling, a Markdown file co-located with the dataset in the data versioning system (DVC, Delta Lake) provides version-controlled documentation that evolves alongside the data.
Great Expectations is the most comprehensive tool for this purpose. It uses a declarative approach where expectations (assertions about data) are defined as code, organised into expectation suites (collections of assertions for a specific dataset), and run as part of the pipeline. An expectation might state "the column age should have no null values," "the column income should be between 0 and 10,000,000," or "the distribution of gender should match the reference distribution with a KS test p-value above 0.01." When an expectation fails, the pipeline halts and produces a structured validation result documenting exactly which expectations failed and by how much. The expectation suites serve as executable documentation of data quality standards that an assessor reviewing Module 4 can read to understand exactly what checks were applied. Great Expectations also generates Data Docs, HTML reports summarising validation results, which serve as evidence artefacts.
For third-party data sources, the validation pipeline is particularly important. Contractual quality specifications set the expected standard, and automated validation confirms each delivery meets it. Deliveries that fail are quarantined (not ingested) and flagged for investigation. The quarantine mechanism and investigation process are documented in the AISDP.
Proxy variable detection identifies features that correlate strongly with protected characteristics. Postcode correlates with ethnicity and socioeconomic status. University name correlates with social class. Name correlates with gender and ethnicity. Detection computes correlation between each feature and each protected characteristic using Pearson for continuous-continuous pairs, point-biserial for continuous-binary pairs, and mutual information for any pair. Features with correlation above a defined threshold (typically 0.3, calibrated to the domain) are flagged. The flag does not mean automatic removal; the Technical SME justifies retention where predictive value warrants it and proxy risk can be mitigated through fairness constraints during training.
Intersectional analysis examines subgroup combinations, since a dataset may have adequate representation of each characteristic individually but critically small cell sizes for intersectional subgroups (for example, women from ethnic minority backgrounds at 3 to 5% of the dataset). Cell sizes must be reported in the AISDP for all examined intersectional subgroups. Where cell sizes fall below minimum thresholds for meaningful analysis (commonly 30 instances for basic metrics, 100 or more for reliable fairness metrics), the AISDP states this limitation explicitly rather than reporting unreliable metrics. Fairlearn's MetricFrame supports intersectional analysis by accepting multiple sensitive features and computing metrics for every combination with confidence intervals.
Post-training bias evaluation measures whether the trained model produces fair outcomes using five complementary metrics. The selection rate ratio (four-fifths rule) flags adverse impact when the positive outcome rate for any subgroup falls below 80% of the majority group's rate. Equalised odds requires similar true positive and false positive rates across subgroups; differences mean the model makes systematically different types of mistakes for different groups. Predictive parity requires positive predictions to be equally accurate across subgroups. Calibration within groups tests whether predicted probabilities correspond to actual outcomes consistently; reliability diagrams (plotting predicted probability against observed frequency per subgroup) are the standard visual tool. Counterfactual fairness testing measures whether changing a protected characteristic while holding other features constant changes the output.
These metrics can conflict: a model achieving equalised odds may fail predictive parity. The organisation must decide which fairness concept takes priority for its specific system and document the rationale.
Post-processing mitigations modify outputs after inference. Threshold adjustment sets different decision thresholds per subgroup to equalise selection rates or error rates. Fairlearn's ThresholdOptimizer automates finding per-subgroup thresholds that satisfy a given fairness constraint while maximising accuracy. Reject option classification routes borderline predictions (where confidence is low) to human review. Calibrated equalised odds (Pleiss et al., 2017) adjusts probability scores per subgroup to achieve both calibration and equalised odds.
Post-processing mitigations face legitimate criticism as cosmetic adjustments that mask underlying model bias without addressing root causes. A model requiring aggressive threshold adjustment has learned something problematic, and the adjustment will not fix that learning if the input distribution shifts. For AISDP purposes, valid reasons for choosing post-processing include: root-cause mitigation would require protected characteristic data the organisation cannot lawfully obtain; root-cause mitigation would reduce accuracy below declared performance thresholds; or the bias is an artefact of historical data that cannot be corrected within the training data. The AI Governance Lead must sign off, and residual risk must be documented. Where no technique fully eliminates bias, compensating controls may include mandatory human review for decisions affecting disadvantaged subgroups, enhanced outcome monitoring, or deployment restrictions.
Feature stores (Feast, Tecton, Hopsworks) address a specific lineage gap: the connection between raw data and computed features. They centralise feature definitions, versioned feature values, and metadata, enforcing consistency between training and inference features to eliminate training-serving skew. Each data engineering step should be wrapped in a pre-step/post-step record capturing the input datasets by version identifier, the intended transform and rationale, expected output characteristics, and actual output characteristics for comparison. Data versioning through DVC, Delta Lake, LakeFS, or cloud-native versioning ensures every dataset has an immutable identifier linked to the model version it trained, so the AISDP can state precisely which data was used for each model version.
When alternatives are insufficient, processing requires five layers of safeguards. Isolation stores special category data in a dedicated, physically or logically separated environment blocked from the main development and production data. Pseudonymisation replaces direct identifiers with pseudonymous keys, with the mapping table stored separately (for example in HashiCorp Vault) under stricter access controls accessible only to the DPO Liaison and a named data governance officer. Encryption requires AES-256 at rest and TLS 1.3 in transit, with no unencrypted special category data at any pipeline point. Access control and audit restricts access to named individuals with documented business need, with immutable audit trails capturing accessor identity, timestamp, data accessed, and purpose. Confidential computing provides the highest assurance through hardware-secured enclaves (Azure SGX, AWS Nitro, Google Confidential VMs) that prevent data exfiltration even by system administrators.
A formal governance workflow requires the Technical SME to prepare a Special Category Data Processing Request specifying the bias detection purpose, specific data elements, processing methodology, and expected retention period. The DPO Liaison reviews for GDPR Article 9 compliance, and the AI Governance Lead confirms that only the minimum data necessary will be processed. Results are extracted in aggregate form only; individual records must not leave the secured environment. Following completion, special category data is deleted or anonymised, with the DPO Liaison verifying deletion technically across all storage locations including backups, caches, and derived datasets.
The right of access (Article 15) requires a training data provenance index mapping individual identifiers to dataset versions. The right to rectification (Article 16) may require dataset modification and model retraining depending on the correction's materiality. The right not to be subject to solely automated decision-making (Article 22) intersects with Article 14 human oversight requirements: if the human oversight documented in AISDP Module 7 constitutes meaningful involvement, processing may not be "solely" automated. Article 86 of the AI Act provides affected persons with a right to explanation of individual decision-making, complementing GDPR Recital 71.
DPIA and FRIA coordination is required because GDPR Article 35 mandates a Data Protection Impact Assessment while AI Act Article 27 requires a Fundamental Rights Impact Assessment. These are distinct but overlapping; findings from each should inform the other. The data retention tension between GDPR's storage limitation principle and Article 18's ten-year AISDP retention obligation is reconciled by retaining documentation (metadata, provenance records, quality metrics, distributional statistics) after the underlying personal data has been deleted or anonymised. Data architecture must be designed so that compliance-relevant information about training data can survive the deletion of the individual records it describes.
Knowledge base governance applies Article 10 requirements adapted for inference-time retrieval. Documentation covers composition (document count, types, source distribution, temporal coverage), completeness against the deployment context, currency (staleness thresholds for time-sensitive domains), provenance for each document, and bias assessment. A medical decision-support RAG system whose knowledge base covers only English-language US guidelines will produce systematically different responses for EU patients where national clinical guidelines differ.
Embedding model bias is a structural concern. Research consistently demonstrates that embedding models trained on broad web corpora encode societal biases that manifest as differential retrieval quality across demographic subgroups. Assessment combines intrinsic evaluation (WEAT and sentence-level extensions) and extrinsic evaluation (paired queries differing only in demographic markers). Multilingual performance varies significantly across models; evaluation should use language-specific retrieval benchmarks (MIRACL, MTEB) and domain-specific test queries. Embeddings as personal data require GDPR assessment where original text can be reconstructed through inversion attacks. Version control coordinates the embedding model version with the knowledge base index version; any model change triggers re-indexing since mismatched vector spaces degrade retrieval quality.
Where a supplier refuses adequate provenance disclosure, quality specifications, or audit rights, the AI System Assessor records the gap in the risk register. If compensating controls are insufficient, the data source must be replaced. Contractual liability allocation does not diminish regulatory obligations; the provider or deployer remains responsible under the AI Act regardless of data source. The contract should specify the supplier's liability for data quality breaches, the obligation to cooperate with regulatory investigations, and indemnification arrangements for losses arising from breach of data quality, provenance, or bias warranties.