We use cookies to improve your experience and analyse site traffic.
Article 10(2)(f) of the EU AI Act requires providers to examine training data for biases likely to affect health, safety, fundamental rights, or lead to discrimination. This cluster covers the practical and technical approaches for bias detection before and after model training, including the mitigation strategies that must be documented in the AISDP.
Article 10(2)(f) requires providers of high-risk AI systems to examine training data for possible biases that are likely to affect health and safety, have a negative impact on fundamental rights, or lead to discrimination.
Article 10(2)(f) requires providers of high-risk AI systems to examine training data for possible biases that are likely to affect health and safety, have a negative impact on fundamental rights, or lead to discrimination. The aisdp must document both the bias detection methods applied before training and the fairness evaluations conducted after training. This obligation covers the full lifecycle: identifying bias in raw data, evaluating whether the trained model produces discriminatory outcomes, and recording the mitigation strategies applied where bias is found. Data Governance and Management provides the broader data governance framework within which fairness evaluation sits.
Bias detection and mitigation is the most technically demanding aspect of data governance under the AI Act. The regulation does not prescribe specific statistical tests or fairness metrics, but it does require that the examination be thorough enough to identify biases that could cause the harms listed in Article 10(2)(f). Providers must therefore select methods appropriate to their system's domain, document the rationale for those choices, and record the results in sufficient detail for conformity assessment.
Distributional analysis compares the distribution of each feature across protected characteristic subgroups to identify historical disparities that a model would learn and perpetuate.
Distributional analysis compares the distribution of each feature across protected characteristic subgroups to identify historical disparities that a model would learn and perpetuate. The analysis should cover every feature in the dataset, not only those the team suspects are problematic. For a recruitment dataset, if female applicants have systematically lower "years of experience" than male applicants due to historical workforce participation patterns, a model that weights experience heavily will reproduce that disparity.
The Technical SME selects statistical tests appropriate to the feature type. For categorical features, the chi-squared test of independence is standard, testing whether the feature distribution is independent of the protected characteristic. For continuous features, the Kolmogorov-Smirnov test compares cumulative distribution functions across subgroups, while the Mann-Whitney U test provides an alternative that is more sensitive to location shifts where one subgroup's values are systematically higher or lower.
The practical output is a matrix with features on one axis and protected characteristics on the other, each cell showing the test statistic and p-value. Features with statistically significant distributional differences (p < 0.05 after Bonferroni correction for multiple comparisons) are flagged for further investigation. Tools such as ydata-profiling automate this for tabular datasets, producing HTML reports with correlation matrices and distribution comparisons. For larger datasets, Apache Spark-based tools or custom scripts may be necessary. The Technical SME documents the distributional analysis with the statistical tests used and accompanying visualisations.
Label bias is subtler and more dangerous than distributional bias because it embeds discrimination directly into the ground truth used for training.
Label bias is subtler and more dangerous than distributional bias because it embeds discrimination directly into the ground truth used for training. If the outcome labels themselves reflect historical discrimination, training on those labels teaches the model to perpetuate that discrimination. In a recruitment context, the label "hired/not hired" encodes the decisions of human recruiters who may have been influenced by conscious or unconscious bias. In criminal justice, the label "re-offended" may reflect differential policing rather than differential behaviour.
Detecting label bias requires stepping outside the data itself. Inter-rater reliability analysis, where multiple independent labellers rate the same instances and agreement is measured via Cohen's kappa or Krippendorff's alpha, reveals the extent to which labels are subjective and potentially biased. Re-labelling by diverse panels provides a corrective dataset against which the original labels can be compared. Where re-labelling is not feasible because the labels reflect real-world outcomes that cannot be re-adjudicated, the AISDP must document the known label bias, its potential effect on the model, and the compensating controls in place.
The AISDP must record the label generation process, any known biases in that process, and the measures taken to mitigate label bias. These measures may include re-labelling by diverse panels, applying bias-aware label smoothing, or using proxy labels that are less susceptible to human bias.
Proxy variable detection identifies features that are not themselves protected characteristics but correlate strongly enough with them to serve as indirect discriminators.
Proxy variable detection identifies features that are not themselves protected characteristics but correlate strongly enough with them to serve as indirect discriminators. Postcode correlates with ethnicity and socioeconomic status; university name correlates with social class; name correlates with gender and ethnicity. A model using these features may discriminate even when protected characteristics are excluded from the input data.
The detection method computes correlation between each feature and each protected characteristic using the appropriate measure: Pearson for continuous-continuous pairs, point-biserial for continuous-binary pairs, and mutual information for any pair. Features with correlation above a defined threshold (0.3 is a common starting point, calibrated to the domain by the Technical SME) are flagged for review. A flag does not mean the feature is automatically removed; it means the Technical SME must justify its inclusion.
If a feature has strong predictive value for the legitimate intended purpose and its proxy risk can be mitigated through fairness constraints during training, retention may be defensible. The Technical SME documents the justification for each retained proxy feature in the AISDP, including the predictive value assessment and the specific mitigation measures applied to reduce the proxy risk.
Intersectional analysis examines combinations of protected characteristics rather than each characteristic in isolation, revealing biases that standard single-axis analysis misses.
Intersectional analysis examines combinations of protected characteristics rather than each characteristic in isolation, revealing biases that standard single-axis analysis misses. Standard analysis examines each protected characteristic individually, but intersectional analysis looks at combinations such as female applicants over 55 or disabled applicants from ethnic minority backgrounds. A dataset may have adequate representation of women (50%) and adequate representation of ethnic minorities (15%), yet have critically small cell sizes for women from ethnic minority backgrounds (3 to 5%). With small cell sizes, statistical tests lack power, and fairness metrics have wide confidence intervals.
Cell sizes must be reported in the AISDP for all examined intersectional subgroups. Where cell sizes fall below a minimum threshold for meaningful analysis (commonly 30 instances for basic metrics, 100 or more for reliable fairness metrics), the document should state this limitation explicitly rather than reporting unreliable metrics. The data may be adequate for each characteristic individually but have critically insufficient representation for intersectional subgroups, making reliable bias detection impossible for those combinations. Monitoring and Review covers how intersectional fairness metrics are tracked during post-deployment monitoring.
Fairlearn's MetricFrame supports intersectional analysis by accepting multiple sensitive features, computing metrics for every combination, and reporting confidence intervals. This is the most practical tool for the purpose, producing structured reports that become evidence artefacts for Module 4. The intersectional analysis, the cell sizes observed, and any subgroups for which the data was insufficient for meaningful analysis are all recorded in the AISDP.
Five complementary fairness metrics capture different aspects of model fairness, and they can conflict with each other: a model that achieves equalised odds may fail predictive parity, and a model that achieves calibration within groups may violate the four-fifths rule.
Five complementary fairness metrics capture different aspects of model fairness, and they can conflict with each other: a model that achieves equalised odds may fail predictive parity, and a model that achieves calibration within groups may violate the four-fifths rule. A decision must be made about which fairness concept takes priority for the specific system, and the rationale must be documented.
The selection rate ratio (the four-fifths rule) is the simplest and most widely understood metric. For a binary classifier, compute the positive outcome rate for each protected subgroup, divide by the majority group's rate, and flag any ratio below 0.80. This metric has regulatory heritage in employment law but measures only outcome rates, not the quality of outcomes. A model that matches outcome rates across subgroups but makes systematically worse predictions for one subgroup would pass this test while remaining unfair.
Equalised odds requires that the model's true positive rate and false positive rate are consistent across subgroups. If a credit scoring model correctly identifies 90% of creditworthy applicants in one ethnic group but only 70% in another, one group is systematically disadvantaged even if the overall approval rate appears balanced. This is primarily a development-time and periodic-review metric because it requires access to ground truth labels.
Predictive parity asks whether the model's positive predictions are equally reliable across subgroups. If positive predictions are correct 85% of the time for one group but only 65% for another, individuals in the second group face a higher risk of being incorrectly subjected to the system's consequences. This metric is particularly important for high-stakes decisions such as credit denial, job rejection, or benefits eligibility.
Pre-processing mitigations modify the training data before the model sees it, correcting imbalances and reducing the influence of biased patterns.
Pre-processing mitigations modify the training data before the model sees it, correcting imbalances and reducing the influence of biased patterns. The simplest approach is resampling: oversampling underrepresented subgroups by creating additional copies of minority records, or undersampling overrepresented subgroups by removing majority records. SMOTE generates synthetic examples by interpolating between existing minority records, which is less prone to overfitting than simple duplication. ADASYN focuses synthetic generation on boundary regions where the classifier struggles most.
Each technique carries trade-offs: oversampling risks overfitting to minority examples; undersampling discards potentially useful data; synthetic augmentation may not capture real-world complexity. Both resampling approaches must be validated by comparing model performance on an unaltered holdout set. Reweighting offers a more elegant alternative, assigning higher training weights to underrepresented subgroups so the model pays more attention to them. The weight for each instance is inversely proportional to the prevalence of its subgroup, ensuring that each subgroup contributes equally to the loss function. AI Fairness 360 provides a reweighting preprocessor that computes optimal weights automatically.
The disparate impact remover modifies feature values to reduce correlations between features and protected characteristics while preserving predictive value. It is effective for tabular data but computationally expensive for high-dimensional data. Learning fair representations takes a more aggressive approach, learning a new feature space that is explicitly uninformative about protected characteristics while remaining predictive for the target variable. Module 4 records which technique was selected, the rationale, and the validation confirming that the technique improved fairness without unacceptable accuracy loss.
In-processing mitigations modify the training procedure itself by incorporating fairness constraints directly into the optimisation process.
In-processing mitigations modify the training procedure itself by incorporating fairness constraints directly into the optimisation process. Fairlearn's ExponentiatedGradient is the most practically accessible approach: it solves a constrained optimisation problem, maximising accuracy subject to a fairness constraint such as demographic parity or equalised odds. The algorithm trains multiple candidate models with different constraint levels and returns the one that best balances accuracy and fairness. It integrates with scikit-learn estimators and requires minimal additional code.
Adversarial debiasing trains an adversary network that tries to predict the protected characteristic from the model's internal representations; the main model is penalised for leaking information about protected characteristics. This is effective for deep learning models but requires careful hyperparameter tuning, as the adversary's learning rate relative to the main model critically affects the fairness-accuracy trade-off.
In-processing techniques modify the model's training process itself and typically require more technical sophistication than pre-processing approaches. The AISDP must document the specific technique selected, the mathematical formulation of the fairness constraint, the observed trade-off between fairness and accuracy, and the hyperparameter choices that balance these objectives. These records enable conformity assessors to evaluate whether the chosen approach was appropriate for the system's risk level.
Post-processing mitigations modify the model's outputs after inference without requiring retraining, making them the least disruptive category of intervention.
Post-processing mitigations modify the model's outputs after inference without requiring retraining, making them the least disruptive category of intervention. Threshold adjustment is the simplest approach: setting a different decision threshold for each subgroup to equalise selection rates or error rates. Fairlearn's ThresholdOptimizer automates this process, finding per-subgroup thresholds that satisfy a given fairness constraint while maximising accuracy. Reject option classification routes borderline predictions where the model's confidence is low to human review, reducing the chance that uncertain predictions disproportionately harm one subgroup. Calibrated equalised odds adjusts the model's probability scores per subgroup to achieve both calibration and equalised odds simultaneously.
Post-processing approaches may be criticised as cosmetic adjustments that mask underlying model bias rather than addressing root causes. A model requiring aggressive threshold adjustment to achieve fairness has learned something problematic, and threshold adjustment will not correct that learning if the input distribution shifts. For AISDP purposes, the critical requirement is to document why post-processing was chosen over root-cause mitigation. Valid reasons include: the root-cause mitigation would require additional protected characteristic data that the organisation cannot lawfully obtain; root-cause mitigation would reduce accuracy below the system's declared performance thresholds; or the bias is an artefact of the deployment context reflecting past discrimination that cannot be corrected within the training data. The AI Governance Lead must sign off on the choice, and the residual risk must be documented.
Where no available mitigation technique fully eliminates bias, the AISDP must document the residual bias and the compensating controls applied to manage the remaining risk.
Where no available mitigation technique fully eliminates bias, the AISDP must document the residual bias and the compensating controls applied to manage the remaining risk. The AI Governance Lead's signed acceptance of the residual bias risk is recorded alongside the technical documentation.
Compensating controls may include mandatory human review for all decisions affecting members of the disadvantaged subgroup, enhanced monitoring of outcomes for that subgroup, or deployment restrictions that limit the system's use to contexts where the residual bias is acceptable. Each compensating control must be specific and auditable, not a vague commitment to "monitor outcomes" but a defined process with responsible owners, review cadences, and escalation triggers. The controls form part of the evidence chain recorded in Module 4 and Module 5 of the AISDP.
The choice between different mitigation strategies and their combinations requires explicit trade-off documentation. No technique eliminates bias without side effects; every technique trades one form of accuracy or fairness for another. The AISDP must record what technique was used, why it was chosen over alternatives, what trade-off it introduced, and whether that trade-off is acceptable for the system's intended purpose and risk level.
Yes. The four-fifths rule measures only outcome rates, not the quality of outcomes. A model that gives positive outcomes to the same proportion of each subgroup but makes systematically worse predictions for one subgroup (more false positives or false negatives) would pass the test while still being unfair.
Post-processing mitigations like threshold adjustment mask underlying model bias rather than addressing root causes. A model requiring aggressive threshold adjustment has learned something problematic, and the adjustment will not correct that learning if the input distribution shifts. The AISDP must document why post-processing was chosen over root-cause mitigation.
The AISDP should state the limitation explicitly rather than reporting unreliable metrics. Cell sizes must be reported for all examined intersectional subgroups, and where they fall below the minimum threshold (commonly 30 for basic metrics), the document records that meaningful analysis was not possible for those combinations.
All five metrics are integrated into a single evaluation report that runs as part of the CI pipeline fairness gate. Results are compared against declared thresholds, and any threshold breach blocks deployment. The reports are stored as Module 4 and Module 5 evidence artefacts.
Five metrics: selection rate ratio (four-fifths rule), equalised odds, predictive parity, calibration within groups, and counterfactual fairness testing, integrated into a single evaluation report.
Resampling (oversampling, undersampling, SMOTE, ADASYN), reweighting to equalise subgroup contributions, disparate impact removal, and learning fair representations.
The AISDP documents residual bias and compensating controls including mandatory human review, enhanced monitoring, and deployment restrictions, with the AI Governance Lead's signed acceptance.
By computing correlation between each feature and each protected characteristic, flagging features above a threshold (commonly 0.3) for Technical SME review and justification.
Calibration within groups tests whether the model's confidence scores are meaningful across subgroups. If a 70% predicted probability corresponds to 70% actual success in one group but only 55% in another, operators relying on confidence scores are systematically misled about the second group's likely outcomes. Reliability diagrams plotting predicted probability against observed frequency per subgroup are the standard visualisation tool. Both Fairlearn and AI Fairness 360 support per-subgroup calibration analysis.
Counterfactual fairness testing directly measures whether changing the protected characteristic for a given individual, while holding all other features constant, changes the model's output. The Technical SME applies this to a representative sample and reports the proportion of predictions that changed. Counterfactual testing is computationally tractable for tabular models but more complex for models where the protected characteristic is entangled with other features, such as in text or image data where gender may be expressed through language patterns rather than a discrete input.
The practical workflow integrates all five metrics into a single evaluation report that runs as part of the CI pipeline fairness gate. Fairlearn's MetricFrame is the most flexible tool for this purpose: the Technical SME defines the metrics, the sensitive features, and the dataset, and it produces a structured report with per-subgroup values. The report is stored as a Module 4 and Module 5 evidence artefact and compared against declared thresholds. Any threshold breach blocks deployment. Continuous Integration for AI Systems details how fairness gates integrate into the deployment pipeline.