What does staging environment validation require?

An environment identical to production in all material respects, running end-to-end inference tests, fairness evaluation subset, and latency tests under simulated load.

How does progressive delivery reduce deployment risk?

Canary deployment sends one to five percent of traffic to the new version with automated metric comparison. Shadow deployment processes production data without affecting users. Both reduce blast radius.

What makes a rollback mechanism compliance-grade?

It must restore the previous version within the declared RTO, without data loss, logged as a deployment event. Rollback capability must be tested quarterly through drills.

What must the deployment ledger record?

Timestamp, model version, configuration version, deployer identity, approval evidence, staging results, canary results, and deployment outcome, in append-only storage with full retention.

Who approves a substantial modification deployment?

The AI Governance Lead and the Legal and Regulatory Advisor must approve jointly, because a substantial modification may trigger re-assessment under the conformity assessment framework and require a new Declaration of Conformity.

Is canary deployment required for compliance?

No. Canary and shadow deployment are recommended controls that reduce deployment risk, but the mandatory requirements are staging validation, human approval, and deployment logging. Progressive delivery requires tooling such as Argo Rollouts or Flagger.

What happens if the approver is unavailable?

The pipeline remains paused until an authorised approver acts. The approval authority matrix should define delegates for each role to avoid bottlenecks. Approval latency is monitored as a governance health metric.

Who approves a substantial modification deployment?

The AI Governance Lead and the Legal and Regulatory Advisor must approve jointly, because a substantial modification may trigger re-assessment under the conformity assessment framework and require a new Declaration of Conformity.

Is canary deployment required for compliance?

No. Canary and shadow deployment are recommended controls that reduce deployment risk, but the mandatory requirements are staging validation, human approval, and deployment logging. Progressive delivery requires tooling such as Argo Rollouts or Flagger.

What happens if the approver is unavailable?

The pipeline remains paused until an authorised approver acts. The approval authority matrix should define delegates for each role to avoid bottlenecks. Approval latency is monitored as a governance health metric.

Continuous Deployment with OPA Compliance Controls

Written by

Michael Clark

Chief Executive Officer, Standard Intelligence

Founder and CEO of Standard Intelligence. Author of the Practitioners Implementation Guide series for EU AI Act compliance.

Martin Dean

Chief Technology Officer, Standard Intelligence

CTO of Standard Intelligence. Leads platform engineering and contributes to the PIG series technical content.

Deployment of high-risk AI systems cannot be fully automated. Article 14's human oversight requirement extends to the deployment decision itself. This page covers staging environment validation, progressive delivery through canary and shadow deployment, compliance-grade rollback mechanisms, the immutable deployment ledger, and how human approval gates are implemented in CI/CD pipelines.

Abstract

Read abstract

High-risk AI system deployment requires human oversight at the deployment decision point. A designated approver must review validation results and authorise deployment, with approval authority escalating based on change classification: Technical SME for routine changes, AI Governance Lead for significant changes, and joint AI Governance Lead plus Legal Advisor for substantial modifications. Staging validation requires an environment identical to production in all material respects, running the full test suite including end-to-end inference, fairness evaluation, and latency testing. Progressive delivery through canary deployment (one to five percent of traffic) or shadow deployment (parallel processing without user exposure) reduces the blast radius of problems that passed staging. Rollback must restore the previous version within the declared recovery time objective and is tested quarterly through drills. The deployment ledger is an immutable, append-only record of every deployment event, retained for the full AISDP evidence period. CI/CD tools natively support human approval gates through manual approval steps. Procedural alternatives are feasible but lose progressive delivery and automated rollback capabilities.

Why can deployment of high-risk AI systems not be fully automated?

Engineering Approach

Deployment of high-risk AI systems cannot be fully automated because Article 14's human oversight requirement extends to the deployment decision itself.

Deployment of high-risk AI systems cannot be fully automated because Article 14's human oversight requirement extends to the deployment decision itself. The decision to expose affected persons to a modified system is consequential and must be taken by a qualified human with full visibility into the validation evidence. The deployment controls must ensure that what was tested is what gets deployed, that the deployment is reversible if problems emerge, and that the deployment event is recorded with sufficient detail for audit and incident investigation.

How should staging validation work?

Engineering Approach

The new version is deployed to a staging environment that mirrors production in all material respects: same infrastructure specification provisioned from the same IaC templates, same model version loaded from the same registry entry, same configuration, and same data sources. The full test suite runs against production-representative data, and performance, fairness, and robustness metrics are computed and compared to the currently deployed version. The Technical SME documents the staging environment in the aisdp as part of the quality management system.

The staging test suite should include the end-to-end inference path test covering the complete processing chain, a subset of the fairness evaluation confirming that fairness metrics in the staging environment match those from the CI evaluation, and a latency test under simulated load confirming that the deployed model meets the declared latency threshold in the actual serving environment. Staging validation must pass before any production deployment can proceed.

How do canary and shadow deployment strategies work?

Engineering Approach

Progressive delivery reduces the blast radius of a deployment that causes problems despite passing staging validation.

Progressive delivery reduces the blast radius of a deployment that causes problems despite passing staging validation. In a canary deployment, the new version receives a small percentage of production traffic, typically 1 to 5 per cent, while the existing version handles the remainder. Automated analysis compares the canary's metrics covering error rate, latency, and prediction distribution against the existing version. If the metrics diverge beyond a defined threshold, the canary is automatically rolled back. If the metrics are acceptable, the canary's traffic share is gradually increased until the new version handles all traffic. Argo Rollouts and Flagger automate this process on Kubernetes with configurable analysis steps and automatic rollback.

Shadow deployment is more conservative: the new version receives production traffic but its outputs are not delivered to users. Instead, the shadow outputs are logged alongside the production outputs for comparison. This allows the organisation to evaluate the new version's behaviour on real production data without any risk to affected persons. Shadow deployment is particularly valuable for initial deployments of high-risk systems, where the consequences of an error are severe and confidence in staging validation is limited.

The canary percentage, the duration of the canary phase, and the metrics thresholds for automatic rollback are defined by the Technical Owner in the deployment policy and documented in the AISDP.

How do human approval gates and rollback work?

Engineering Approach

A designated approver reviews the validation results and authorises deployment.

A designated approver reviews the validation results and authorises deployment. The Technical SME approves routine releases. The AI Governance Lead approves releases affecting fairness metrics, the model architecture, or the intended purpose. At the approval step, the deployment pipeline pauses, presenting the deployment's metadata including model version, validation gate results, and staging test results to the designated approver. Approval resumes the pipeline; rejection halts it. Each approval event is logged with the approver's identity, timestamp, decision, and any comments. GitHub Actions manual approval, GitLab manual jobs, and Jenkins input steps all support this pattern.

The pipeline must support instant rollback to the previous version if post-deployment monitoring detects problems. Kubernetes rolling updates with Helm provide a single-command rollback that restores the previous chart release including the previous model version and configuration. Argo Rollouts provides automatic rollback triggered by canary analysis failure. The engineering team tests rollback capability regularly through quarterly rollback drills. The rollback procedure, including decision criteria, authorised personnel, expected timeline, and the procedure itself, is documented in the AISDP by the Technical Owner. Rollback itself is logged in the deployment ledger as a deployment event with the reason for the rollback and the version restored to.

What does the deployment ledger record?

Engineering Approach

The deployment ledger is the immutable record of every deployment event, providing the authoritative history for AISDP Module 12 change management.

The deployment ledger is the immutable record of every deployment event, providing the authoritative history for AISDP Module 12 change management. Each entry records the deployment timestamp, the model version deployed, the configuration version, the deployer's identity, the approval evidence covering who approved and when, the staging test results, the canary analysis results where applicable, and the deployment outcome as success, rollback, or failure.

The engineering team stores the ledger in append-only storage such as S3 Object Lock or Azure Immutable Blob Storage and retains it for the AISDP evidence period. ArgoCD and Flux both produce audit logs that serve as deployment ledgers. For non-GitOps deployments, the engineering team implements a custom append-only log.

An OPA policy can enforce compliance prerequisites before deployment proceeds, verifying that the AISDP version matches the model's assessed version, that all four validation gates have passed, that human approval has been recorded within the preceding 48 hours, and that staging tests passed on the exact version being deployed. The policy produces structured deny reasons for debugging failed deployments.

What is the procedural alternative for deployment controls?

Compensating Controls

Staging validation, human approval, and the deployment ledger can all be implemented procedurally without specialised tooling.

Staging validation, human approval, and the deployment ledger can all be implemented procedurally without specialised tooling. Before each production deployment, the team executes a defined test suite in the staging environment and records the results. The AI Governance Lead signs a deployment approval form confirming that staging tests passed, evaluation metrics meet declared thresholds, and deployment is authorised, including the model version, configuration version, and date.

A deployment ledger in spreadsheet or log file form records every deployment event with timestamp, version deployed, approver, staging test result reference, and outcome. The Technical Owner tests the manual rollback procedure quarterly through simulated exercises.

The procedural approach loses canary deployment with progressive traffic shifting and automated analysis, shadow deployment with parallel execution and comparison, and automated rollback on metric breach. These capabilities require tooling such as Argo Rollouts or Flagger, both of which are open-source. For high-risk systems where the consequences of a faulty deployment are significant, the investment in progressive delivery tooling is strongly recommended.

Continuous Deployment with OPA Compliance Controls

Written by

Why can deployment of high-risk AI systems not be fully automated?

How should staging validation work?

How do canary and shadow deployment strategies work?

How do human approval gates and rollback work?

What does the deployment ledger record?

What is the procedural alternative for deployment controls?

Frequently Asked Questions

Related Pages

In This Section

Build compliance into your pipeline