We use cookies to improve your experience and analyse site traffic.
The third governance gate provides dedicated fairness evaluation beyond engineering-level metrics. It assesses three dimensions: absolute thresholds across all protected subgroups, comparative fairness against the production model, and intersectional analysis crossing multiple characteristics. This page covers the gate's evaluation criteria, intersectional analysis methodology, and subgroup coverage requirements.
The third governance gate is dedicated to fairness evaluation, assessing the candidate model against a richer set of criteria than the engineering pipeline's fairness metrics alone.
The third governance gate is dedicated to fairness evaluation, assessing the candidate model against a richer set of criteria than the engineering pipeline's fairness metrics alone. The gate evaluates three fairness dimensions.
First, absolute thresholds: does the model meet the declared minimum selection rate ratio, equalised odds tolerance, and calibration requirements across all protected characteristic subgroups as documented in aisdp Module 6 risk register. Second, comparative thresholds: does the candidate model's fairness profile represent a degradation relative to the current production model. A candidate that meets absolute thresholds but is materially less fair than the incumbent warrants scrutiny, because the organisation must justify deploying a less fair system. Third, intersectional analysis: do fairness metrics hold when subgroups are intersected such as gender by ethnicity or age by disability. Intersectional failures are frequently invisible in single-axis fairness evaluations.
The gate fails when any subgroup metric crosses the declared threshold, when intersectional analysis reveals compound disadvantage for a subgroup, or when version-to-version fairness regresses beyond the defined tolerance without documented justification. The gate produces a Fairness Gate Record containing the full disaggregated evaluation across all subgroups and intersections, the threshold comparison, and the gate decision. This record is deposited in the governance artefact registry and referenced by AISDP Modules 4, 5, and 6.
The Technical SME executes the fairness evaluation. The AI System Assessor reviews the results and approves the gate decision. The Legal and Regulatory Advisor reviews any cases where a threshold breach is proposed for acceptance with compensating justification.
The organisation must define, before the fairness gate is configured, the complete set of protected characteristic subgroups against which the system will be evaluated.
The organisation must define, before the fairness gate is configured, the complete set of protected characteristic subgroups against which the system will be evaluated. The subgroup set must be derived from applicable equality legislation, informed by the FRIA, and documented in the AISDP. The subgroup set should include single-axis groups covering gender, ethnicity, age, and disability, and intersectional groups where the FRIA identifies elevated risk.
A common failure mode is evaluating fairness only for subgroups where data is abundant while omitting subgroups where data is sparse. The fairness gate should flag subgroups where the evaluation sample is below a statistical reliability threshold, typically 30 observations per subgroup, and report those subgroups as insufficient data for evaluation rather than silently omitting them. This transparency is essential: a competent authority reviewing the AISDP will note the absence of evaluation for a subgroup and draw adverse inferences.
No single fairness metric captures all dimensions of fairness, as the impossibility theorems in the fairness literature establish that certain desirable fairness properties cannot be simultaneously satisfied.
No single fairness metric captures all dimensions of fairness, as the impossibility theorems in the fairness literature establish that certain desirable fairness properties cannot be simultaneously satisfied. The AI Governance Lead, in consultation with the Legal and Regulatory Advisor, selects the fairness metrics appropriate to the system's use case and documents the selection rationale in the AISDP. The rationale addresses why the chosen metrics are appropriate for the system's context, what trade-offs were accepted, and how the organisation will monitor for forms of unfairness that the chosen metrics do not capture.
Intersectional subgroups are formed by crossing two or more single-axis characteristics. The number of subgroups grows combinatorially: for a system evaluating four characteristics with three categories each, the intersectional space contains 81 subgroups. Many will have sample sizes too small for reliable evaluation. The Technical SME addresses this by evaluating all intersectional subgroups where the sample size meets the reliability threshold and reporting the coverage explicitly, for example noting that fairness was evaluated for 34 of 81 intersectional subgroups with the remainder having fewer than 30 observations. This explicit coverage reporting ensures the gate's limitations are documented rather than hidden.
Yes. If it is materially less fair than the production model (comparative degradation) or if intersectional analysis reveals below-minimum subgroups, the gate fails even with absolute thresholds met.
All subgroups where the sample size meets the reliability threshold, typically 30 observations. Coverage must be reported explicitly, such as 'fairness evaluated for 34 of 81 intersectional subgroups; 47 had fewer than 30 observations.'
Mathematical results establishing that certain desirable fairness properties cannot be simultaneously satisfied. The organisation must choose which properties to prioritise, document the rationale, and monitor for forms of unfairness the chosen metrics do not capture.
Subgroups must be derived from equality legislation and FRIA. Subgroups with fewer than 30 observations are flagged as insufficient rather than silently omitted.
Selected by the AI Governance Lead with Legal Advisor consultation. The rationale must document why metrics suit the context, what trade-offs were accepted, and how unmonitored unfairness is addressed.