We use cookies to improve your experience and analyse site traffic.
Deployment of high-risk AI systems cannot be fully automated. Article 14's human oversight requirement extends to the deployment decision itself. This page covers staging environment validation, progressive delivery through canary and shadow deployment, compliance-grade rollback mechanisms, the immutable deployment ledger, and how human approval gates are implemented in CI/CD pipelines.
Deployment of high-risk AI systems cannot be fully automated because Article 14's human oversight requirement extends to the deployment decision itself.
Deployment of high-risk AI systems cannot be fully automated because Article 14's human oversight requirement extends to the deployment decision itself. The decision to expose affected persons to a modified system is consequential and must be taken by a qualified human with full visibility into the validation evidence. The deployment controls must ensure that what was tested is what gets deployed, that the deployment is reversible if problems emerge, and that the deployment event is recorded with sufficient detail for audit and incident investigation.
The new version is deployed to a staging environment that mirrors production in all material respects: same infrastructure specification provisioned from the same IaC templates, same model version loaded from the same registry entry, same configuration, and same data sources.
The new version is deployed to a staging environment that mirrors production in all material respects: same infrastructure specification provisioned from the same IaC templates, same model version loaded from the same registry entry, same configuration, and same data sources. The full test suite runs against production-representative data, and performance, fairness, and robustness metrics are computed and compared to the currently deployed version. The Technical SME documents the staging environment in the aisdp as part of the quality management system.
The staging test suite should include the end-to-end inference path test covering the complete processing chain, a subset of the fairness evaluation confirming that fairness metrics in the staging environment match those from the CI evaluation, and a latency test under simulated load confirming that the deployed model meets the declared latency threshold in the actual serving environment. Staging validation must pass before any production deployment can proceed.
Progressive delivery reduces the blast radius of a deployment that causes problems despite passing staging validation.
Progressive delivery reduces the blast radius of a deployment that causes problems despite passing staging validation. In a canary deployment, the new version receives a small percentage of production traffic, typically 1 to 5 per cent, while the existing version handles the remainder. Automated analysis compares the canary's metrics covering error rate, latency, and prediction distribution against the existing version. If the metrics diverge beyond a defined threshold, the canary is automatically rolled back. If the metrics are acceptable, the canary's traffic share is gradually increased until the new version handles all traffic. Argo Rollouts and Flagger automate this process on Kubernetes with configurable analysis steps and automatic rollback.
Shadow deployment is more conservative: the new version receives production traffic but its outputs are not delivered to users. Instead, the shadow outputs are logged alongside the production outputs for comparison. This allows the organisation to evaluate the new version's behaviour on real production data without any risk to affected persons. Shadow deployment is particularly valuable for initial deployments of high-risk systems, where the consequences of an error are severe and confidence in staging validation is limited.
The canary percentage, the duration of the canary phase, and the metrics thresholds for automatic rollback are defined by the Technical Owner in the deployment policy and documented in the AISDP.
A designated approver reviews the validation results and authorises deployment.
A designated approver reviews the validation results and authorises deployment. The Technical SME approves routine releases. The AI Governance Lead approves releases affecting fairness metrics, the model architecture, or the intended purpose. At the approval step, the deployment pipeline pauses, presenting the deployment's metadata including model version, validation gate results, and staging test results to the designated approver. Approval resumes the pipeline; rejection halts it. Each approval event is logged with the approver's identity, timestamp, decision, and any comments. GitHub Actions manual approval, GitLab manual jobs, and Jenkins input steps all support this pattern.
The pipeline must support instant rollback to the previous version if post-deployment monitoring detects problems. Kubernetes rolling updates with Helm provide a single-command rollback that restores the previous chart release including the previous model version and configuration. Argo Rollouts provides automatic rollback triggered by canary analysis failure. The engineering team tests rollback capability regularly through quarterly rollback drills. The rollback procedure, including decision criteria, authorised personnel, expected timeline, and the procedure itself, is documented in the AISDP by the Technical Owner. Rollback itself is logged in the deployment ledger as a deployment event with the reason for the rollback and the version restored to.
The deployment ledger is the immutable record of every deployment event, providing the authoritative history for AISDP Module 12 change management.
The deployment ledger is the immutable record of every deployment event, providing the authoritative history for AISDP Module 12 change management. Each entry records the deployment timestamp, the model version deployed, the configuration version, the deployer's identity, the approval evidence covering who approved and when, the staging test results, the canary analysis results where applicable, and the deployment outcome as success, rollback, or failure.
The engineering team stores the ledger in append-only storage such as S3 Object Lock or Azure Immutable Blob Storage and retains it for the AISDP evidence period. ArgoCD and Flux both produce audit logs that serve as deployment ledgers. For non-GitOps deployments, the engineering team implements a custom append-only log.
An OPA policy can enforce compliance prerequisites before deployment proceeds, verifying that the AISDP version matches the model's assessed version, that all four validation gates have passed, that human approval has been recorded within the preceding 48 hours, and that staging tests passed on the exact version being deployed. The policy produces structured deny reasons for debugging failed deployments.
Staging validation, human approval, and the deployment ledger can all be implemented procedurally without specialised tooling.
Staging validation, human approval, and the deployment ledger can all be implemented procedurally without specialised tooling. Before each production deployment, the team executes a defined test suite in the staging environment and records the results. The AI Governance Lead signs a deployment approval form confirming that staging tests passed, evaluation metrics meet declared thresholds, and deployment is authorised, including the model version, configuration version, and date.
A deployment ledger in spreadsheet or log file form records every deployment event with timestamp, version deployed, approver, staging test result reference, and outcome. The Technical Owner tests the manual rollback procedure quarterly through simulated exercises.
The procedural approach loses canary deployment with progressive traffic shifting and automated analysis, shadow deployment with parallel execution and comparison, and automated rollback on metric breach. These capabilities require tooling such as Argo Rollouts or Flagger, both of which are open-source. For high-risk systems where the consequences of a faulty deployment are significant, the investment in progressive delivery tooling is strongly recommended.
The AI Governance Lead and the Legal and Regulatory Advisor must approve jointly, because a substantial modification may trigger re-assessment under the conformity assessment framework and require a new Declaration of Conformity.
No. Canary and shadow deployment are recommended controls that reduce deployment risk, but the mandatory requirements are staging validation, human approval, and deployment logging. Progressive delivery requires tooling such as Argo Rollouts or Flagger.
The pipeline remains paused until an authorised approver acts. The approval authority matrix should define delegates for each role to avoid bottlenecks. Approval latency is monitored as a governance health metric.
It must restore the previous version within the declared RTO, without data loss, logged as a deployment event. Rollback capability must be tested quarterly through drills.
Timestamp, model version, configuration version, deployer identity, approval evidence, staging results, canary results, and deployment outcome, in append-only storage with full retention.