We use cookies to improve your experience and analyse site traffic.
A governance CI/CD pipeline extends the standard ML pipeline to prove that every compliance obligation was satisfied at every stage. It adds five layers: policy enforcement, governance gates, evidence generation, AISDP synchronisation, and audit persistence.
A standard CI/CD pipeline builds, tests, and deploys software.
A standard CI/CD pipeline builds, tests, and deploys software. A governance CI/CD pipeline builds, tests, deploys, and proves that every compliance obligation attached to the system was satisfied at every stage, by the right person, with the right evidence, at the right time. A conventional ML pipeline can pass every test and still produce a system that fails conformity assessment if the evidence is not traceable, the approval chain is not documented, the version assessed is not the version deployed, or the AISDP modules have not been updated.
The governance pipeline eliminates these gaps by treating compliance evidence as a first-class artefact, produced at every stage, immutably stored, and automatically reconciled against the aisdp. It extends the standard ML pipeline with five additional layers: a policy enforcement layer, a governance gate layer, an evidence generation layer, an AISDP synchronisation layer, and an audit persistence layer. These layers operate alongside existing stages rather than replacing them.
The governance pipeline is not explicitly required by any single Article but is the engineering infrastructure through which multiple obligations are satisfied simultaneously. Article 9 risk management is enforced through pipeline gates. Article 10 data governance is evidenced through data validation stages. Article 11 and Annex IV technical documentation is generated as a byproduct. Article 12 recording is provided by immutable execution logs. Article 15 robustness is tested through embedded security gates. Article 17 QMS is evidenced by the pipeline definition and execution history. Article 3(23) substantial modification is evaluated at the change classification gate before deployment proceeds.
The five governance layers each serve a distinct function.
The five governance layers each serve a distinct function. The policy enforcement layer evaluates every change against the organisation's compliance policies before any pipeline stage executes. Policies are defined as code using Open Policy Agent or a similar policy engine, stored in the repository alongside application code, and version-controlled. Each policy maps to a specific regulatory requirement.
The governance gate layer adds five compliance-specific gates that supplement the engineering pipeline's existing test and validation stages. These gates are blocking: a failure at any gate halts the pipeline and requires remediation before deployment can proceed.
The evidence generation layer captures structured compliance evidence from every pipeline stage. Evidence is tagged with the pipeline execution identifier, the composite version, the stage that produced it, and the timestamp. Evidence is immutable once generated.
The AISDP synchronisation layer automatically updates AISDP module content when the pipeline produces new evidence that supersedes existing content. When a new model passes evaluation, the synchronisation layer updates Module 5 with the new metrics. When a new deployment completes, Module 3 is updated with the current architecture. This eliminates the documentation drift where the AISDP describes a historical version rather than the deployed version.
The audit persistence layer stores the complete execution record in tamper-evident, immutable storage. Every gate decision, every approval, every evidence artefact, and every AISDP update is recorded with cryptographic integrity guarantees. The audit record is the primary evidence for Annex VI conformity assessment.
Gate 1 (Classification and Scope) verifies that the system's risk classification is current and that the change falls within the scope of the existing conformity assessment.
Gate 1 (Classification and Scope) verifies that the system's risk classification is current and that the change falls within the scope of the existing conformity assessment. If the change introduces a new use case or affects a new population, the gate flags a potential scope change for human review.
Gate 2 (Risk and Fundamental Rights) evaluates the change against the risk register and the FRIA. For data changes, it assesses whether the new data introduces risks not covered by the current risk assessment. For model changes, it evaluates performance and fairness metrics against the thresholds declared in the risk register. For configuration changes, it assesses whether the new settings alter the system's risk profile. The gate produces a structured risk delta report.
Gate 3 (Fairness and Bias) is a dedicated fairness evaluation that goes beyond the engineering pipeline's fairness metrics. It computes fairness measures across all declared protected characteristic subgroups, evaluates intersectional fairness where cell sizes permit, and compares results against both absolute thresholds and version-to-version deltas. A fairness regression, where any subgroup's metrics worsen beyond the defined tolerance, blocks the pipeline.
Gate 4 (Documentation Currency) verifies that all AISDP modules affected by the change have been updated to reflect the new state. It cross-references the change's scope against a mapping of pipeline stages to AISDP modules, flagging any module that should have been updated but was not.
Gate 5 (Deployment Authorisation) requires explicit human approval before production deployment. The approver reviews the gate results from Gates 1 through 4, the evidence summary, and the change classification. For changes classified as substantial modifications, Gate 5 requires AI Governance Lead approval in addition to Technical Owner approval.
The governance artefact registry is the structured store that holds every compliance evidence artefact produced by the governance pipeline.
The governance artefact registry is the structured store that holds every compliance evidence artefact produced by the governance pipeline. Each artefact is indexed by the composite version it relates to, the AISDP module it supports, the pipeline execution that produced it, and the regulatory requirement it evidences.
The registry provides two critical capabilities for conformity assessment. First, given any composite version, the registry can produce the complete set of compliance evidence that supports the Declaration of Conformity for that version. Second, given any AISDP module, the registry can produce the most recent evidence artefact for that module, ensuring the AISDP always references current evidence.
The registry distinguishes between artefact types: evaluation results, gate decisions, approval records, configuration snapshots, deployment records, and AISDP module content. Each type has its own schema, retention policy, and access controls. The registry's contents are immutable; artefacts are never modified or deleted within the retention period. New evidence supersedes old evidence by creating a new version, not by modifying the existing version.
The change classification gate evaluates every change against the substantial modification criteria defined in Article 3(23) before deployment proceeds.
The change classification gate evaluates every change against the substantial modification criteria defined in Article 3(23) before deployment proceeds. The classification operates on the composite version delta: the difference between the current production composite version and the proposed new composite version.
The classification algorithm evaluates quantitative thresholds: performance delta, fairness delta, output distribution shift, and cumulative baseline drift. It evaluates qualitative flags: model architecture change, intended purpose change, new data source introduction, and protected characteristic handling change. It also evaluates regulatory triggers: new deployment jurisdiction, new deployer category, and new affected population.
Changes are classified into three tiers. Tier 1 (routine) changes fall within all quantitative thresholds and trigger no qualitative flags; they proceed through the pipeline with standard approval. Tier 2 (significant) changes approach quantitative thresholds or trigger qualitative flags; they require focused review of affected AISDP modules by the AI Governance Lead. Tier 3 (substantial) changes cross quantitative thresholds, change the intended purpose, or trigger regulatory flags; they require full conformity re-assessment before deployment.
The cumulative baseline comparison is the most important check. It compares the proposed version against the version that was assessed at the last conformity assessment, not just against the current production version. This catches gradual drift where each individual change is within thresholds but the aggregate of many changes has moved the system far from its assessed state.
Organisations with multiple high-risk AI systems need pipeline orchestration that coordinates governance across the portfolio.
Organisations with multiple high-risk AI systems need pipeline orchestration that coordinates governance across the portfolio. A shared governance pipeline template provides consistent gate definitions, threshold structures, and evidence formats across all systems. System-specific configurations customise the template for each system's risk profile, fairness requirements, and AISDP structure.
Cross-system dependency tracking identifies cases where a change to one system affects another. A shared feature store, a common model component, or a shared infrastructure service may create dependencies that the pipeline must account for. When a shared component changes, the pipeline evaluates the impact on every system that depends on it.
Portfolio-level governance reporting aggregates pipeline results across all systems, showing the AI Governance Lead the overall compliance posture: how many systems have passing gates, how many have open gate failures, and how the compliance posture is trending over time. This reporting feeds into the quarterly governance review and the AISDP Module 10 evidence for each system.
Governance pipeline monitoring tracks the health and effectiveness of the governance infrastructure itself.
Governance pipeline monitoring tracks the health and effectiveness of the governance infrastructure itself. Key metrics include gate pass rates by gate type and by system, mean time from change to deployment showing whether governance is creating unacceptable delays, evidence generation completeness verifying that every required artefact was produced, and AISDP synchronisation lag measuring the time between a pipeline producing new evidence and the AISDP reflecting it.
Alert conditions include a gate failure rate exceeding the historical baseline, which may indicate either a quality problem or a threshold calibration issue. Evidence generation failure means a pipeline completed but did not produce all expected artefacts. AISDP synchronisation failure means the AISDP does not reflect the current deployed state.
For the AISDP itself, the governance pipeline is documented as part of the QMS. The documentation includes the pipeline architecture showing the five governance layers and their interaction with the engineering pipeline, gate definitions with threshold values and rationale, approval workflow definitions, evidence retention policies, change classification criteria and thresholds, and the governance artefact registry structure.
For organisations at earlier maturity levels, the governance pipeline's functions can be performed through manual checklists and spreadsheet-based registries. A change classification checklist walks through the Article 3(23) criteria for each proposed deployment. A compliance gate checklist verifies fairness, risk, documentation currency, and authorisation before deployment. An evidence log spreadsheet records what was checked, by whom, when, and the result. The loss is automation and enforcement: a manual process depends on the team's discipline, and a missed step creates a compliance gap with no automatic detection.
Yes, through manual checklists and spreadsheet-based registries: a change classification checklist, a compliance gate checklist, and an evidence log. The loss is automated enforcement — manual processes depend on team discipline, and missed steps create undetected compliance gaps.
Every compliance evidence artefact is indexed by composite version, AISDP module, pipeline execution, and regulatory requirement. Given any version, the registry produces the complete evidence set. Artefacts are immutable; new evidence supersedes old by creating new versions.
Shared governance pipeline templates provide consistent gates and evidence formats. Cross-system dependency tracking evaluates the impact when shared components change. Portfolio reporting aggregates compliance posture across all systems for the AI Governance Lead.
The AISDP synchronisation layer automatically updates module content when the pipeline produces new evidence. When a new model passes evaluation, Module 5 is updated. When deployment completes, Module 3 is updated. This eliminates documentation drift.