What makes intersectional fairness analysis essential?

Intersectional failures are invisible in single-axis evaluations. Specific combinations of characteristics can experience bias even when each individual characteristic appears fair.

How should subgroup definition and coverage be managed?

Subgroups must be derived from equality legislation and FRIA. Subgroups with fewer than 30 observations are flagged as insufficient rather than silently omitted.

How are fairness metrics selected?

Selected by the AI Governance Lead with Legal Advisor consultation. The rationale must document why metrics suit the context, what trade-offs were accepted, and how unmonitored unfairness is addressed.

Can a model that meets absolute thresholds still fail the fairness gate?

Yes. If it is materially less fair than the production model (comparative degradation) or if intersectional analysis reveals below-minimum subgroups, the gate fails even with absolute thresholds met.

How many intersectional subgroups need evaluation?

All subgroups where the sample size meets the reliability threshold, typically 30 observations. Coverage must be reported explicitly, such as 'fairness evaluated for 34 of 81 intersectional subgroups; 47 had fewer than 30 observations.'

What are fairness impossibility theorems?

Mathematical results establishing that certain desirable fairness properties cannot be simultaneously satisfied. The organisation must choose which properties to prioritise, document the rationale, and monitor for forms of unfairness the chosen metrics do not capture.

Can a model that meets absolute thresholds still fail the fairness gate?

Yes. If it is materially less fair than the production model (comparative degradation) or if intersectional analysis reveals below-minimum subgroups, the gate fails even with absolute thresholds met.

How many intersectional subgroups need evaluation?

All subgroups where the sample size meets the reliability threshold, typically 30 observations. Coverage must be reported explicitly, such as 'fairness evaluated for 34 of 81 intersectional subgroups; 47 had fewer than 30 observations.'

What are fairness impossibility theorems?

Mathematical results establishing that certain desirable fairness properties cannot be simultaneously satisfied. The organisation must choose which properties to prioritise, document the rationale, and monitor for forms of unfairness the chosen metrics do not capture.

The third governance gate provides dedicated fairness evaluation beyond engineering-level metrics. It assesses three dimensions: absolute thresholds across all protected subgroups, comparative fairness against the production model, and intersectional analysis crossing multiple characteristics. This page covers the gate's evaluation criteria, intersectional analysis methodology, and subgroup coverage requirements.

Abstract

Read abstract

The fairness and bias gate evaluates candidate models against three dimensions that go beyond the engineering pipeline's metric computation. Absolute thresholds check whether declared minimum selection rate ratios, equalised odds tolerances, and calibration requirements are met across all protected subgroups. Comparative thresholds check whether the candidate represents a fairness degradation relative to the incumbent, requiring justification for any regression. Intersectional analysis crosses characteristics such as gender and ethnicity to catch failures invisible in single-axis evaluations. The gate fails if any absolute threshold is breached, if comparative degradation exceeds tolerance without justification, or if intersectional analysis reveals below-minimum subgroups. Subgroup definition must be comprehensive, derived from equality legislation and the Fundamental Rights Impact Assessment. Subgroups below the statistical reliability threshold of 30 observations are flagged as insufficient rather than omitted. Fairness metric selection must document the rationale and accepted trade-offs, acknowledging impossibility theorems. The Fairness Gate Record provides full disaggregated evaluation with explicit coverage reporting, referenced by AISDP Modules 4, 5, and 6.

What does the fairness and bias gate evaluate?

Engineering Approach

The third governance gate is dedicated to fairness evaluation, assessing the candidate model against a richer set of criteria than the engineering pipeline's fairness metrics alone.

The third governance gate is dedicated to fairness evaluation, assessing the candidate model against a richer set of criteria than the engineering pipeline's fairness metrics alone. The gate evaluates three fairness dimensions.

First, absolute thresholds: does the model meet the declared minimum selection rate ratio, equalised odds tolerance, and calibration requirements across all protected characteristic subgroups as documented in aisdp Module 6 risk register. Second, comparative thresholds: does the candidate model's fairness profile represent a degradation relative to the current production model. A candidate that meets absolute thresholds but is materially less fair than the incumbent warrants scrutiny, because the organisation must justify deploying a less fair system. Third, intersectional analysis: do fairness metrics hold when subgroups are intersected such as gender by ethnicity or age by disability. Intersectional failures are frequently invisible in single-axis fairness evaluations.

The gate fails when any subgroup metric crosses the declared threshold, when intersectional analysis reveals compound disadvantage for a subgroup, or when version-to-version fairness regresses beyond the defined tolerance without documented justification. The gate produces a Fairness Gate Record containing the full disaggregated evaluation across all subgroups and intersections, the threshold comparison, and the gate decision. This record is deposited in the governance artefact registry and referenced by AISDP Modules 4, 5, and 6.

The Technical SME executes the fairness evaluation. The AI System Assessor reviews the results and approves the gate decision. The Legal and Regulatory Advisor reviews any cases where a threshold breach is proposed for acceptance with compensating justification.

How should subgroup coverage and sparse data be handled?

Engineering Approach

The organisation must define, before the fairness gate is configured, the complete set of protected characteristic subgroups against which the system will be evaluated.

The organisation must define, before the fairness gate is configured, the complete set of protected characteristic subgroups against which the system will be evaluated. The subgroup set must be derived from applicable equality legislation, informed by the FRIA, and documented in the AISDP. The subgroup set should include single-axis groups covering gender, ethnicity, age, and disability, and intersectional groups where the FRIA identifies elevated risk.

A common failure mode is evaluating fairness only for subgroups where data is abundant while omitting subgroups where data is sparse. The fairness gate should flag subgroups where the evaluation sample is below a statistical reliability threshold, typically 30 observations per subgroup, and report those subgroups as insufficient data for evaluation rather than silently omitting them. This transparency is essential: a competent authority reviewing the AISDP will note the absence of evaluation for a subgroup and draw adverse inferences.

How should fairness metrics and intersectional analysis be configured?

Engineering Approach

No single fairness metric captures all dimensions of fairness, as the impossibility theorems in the fairness literature establish that certain desirable fairness properties cannot be simultaneously satisfied. The AI Governance Lead, in consultation with the Legal and Regulatory Advisor, selects the fairness metrics appropriate to the system's use case and documents the selection rationale in the AISDP. The rationale addresses why the chosen metrics are appropriate for the system's context, what trade-offs were accepted, and how the organisation will monitor for forms of unfairness that the chosen metrics do not capture.

Intersectional subgroups are formed by crossing two or more single-axis characteristics. The number of subgroups grows combinatorially: for a system evaluating four characteristics with three categories each, the intersectional space contains 81 subgroups. Many will have sample sizes too small for reliable evaluation. The Technical SME addresses this by evaluating all intersectional subgroups where the sample size meets the reliability threshold and reporting the coverage explicitly, for example noting that fairness was evaluated for 34 of 81 intersectional subgroups with the remainder having fewer than 30 observations. This explicit coverage reporting ensures the gate's limitations are documented rather than hidden.

Gate 3: Fairness and Bias Validation

Written by

What does the fairness and bias gate evaluate?

How should subgroup coverage and sparse data be handled?

How should fairness metrics and intersectional analysis be configured?

Frequently Asked Questions

Related Pages

In This Section

Build compliance into your pipeline