We use cookies to improve your experience and analyse site traffic.
Article 12 of the EU AI Act requires that high-risk AI systems technically allow for automatic event recording. The CI/CD pipeline is itself a security surface whose integrity determines whether quality gates and compliance controls are consistently enforced across every deployment.
A compromised CI/CD pipeline could inject malicious code or model artefacts into a high-risk AI system, altering its behaviour in ways that affect the safety and fundamental rights of affected persons.
A compromised CI/CD pipeline could inject malicious code or model artefacts into a high-risk AI system, altering its behaviour in ways that affect the safety and fundamental rights of affected persons. The pipeline is not merely a convenience for deployment; it is the mechanism through which every quality gate, validation check, and compliance control is enforced. If the pipeline itself is insecure, then every downstream assurance is undermined.
Article 12 requires that high-risk AI systems technically allow for the automatic recording of events. The word "technically" is significant: logging must be a structural property of the system, embedded in its architecture, not a feature that can be disabled through configuration or circumvented through alternative code paths. Pipeline security directly supports this requirement by ensuring that the infrastructure producing and deploying artefacts is itself trustworthy and auditable.
Signed commits and signed container images establish a chain of trust from developer to deployment.
Signed commits and signed container images establish a chain of trust from developer to deployment. All code commits should be GPG-signed, confirming the identity of the committer, and unsigned commits should be rejected by the pipeline. This ensures that every change entering the system can be attributed to a verified individual.
The engineering team signs container images at build time using Docker Content Trust, Sigstore Cosign, or an equivalent mechanism. The deployment infrastructure verifies signatures before pulling images, and unsigned or unverified images are not deployed. Cosign supports keyless signing via Sigstore's Fulcio CA, using identity from OIDC providers, which simplifies key management while maintaining strong identity assurance. At deploy time, the signature is verified against the expected certificate identity and OIDC issuer before the image is admitted to the cluster.
Pipeline configuration files must be stored in version control and subject to the same review process as application code.
Pipeline configuration files must be stored in version control and subject to the same review process as application code. CI/CD pipeline configuration, whether Jenkinsfile, GitHub Actions workflows, or GitLab CI configuration, is maintained by the engineering team under the same change management discipline as the application itself. Only designated pipeline administrators should have write access to pipeline configuration files.
Build environment isolation is equally important. Build agents should be ephemeral, destroyed and recreated for each build, to prevent persistent compromise. An agent that persists between builds could carry compromised state from one pipeline run to the next. The engineering team resolves build dependencies from trusted, pinned sources, typically a private artefact repository mirroring approved versions, rather than fetching the latest version from public registries at build time. This eliminates a common supply chain attack vector where a malicious package is substituted for a legitimate dependency.
Each build should generate a Software Bill of Materials listing every component, dependency, and library incorporated into the build artefact.
Each build should generate a Software Bill of Materials listing every component, dependency, and library incorporated into the build artefact. The SBOM supports vulnerability tracking, licence compliance, and supply chain transparency. For AI systems specifically, the SBOM should extend beyond software dependencies to include model artefacts and their provenance, creating a complete record of everything that constitutes the deployed system.
The SBOM can be attached to the container image as an attestation using tools such as Cosign, linking the bill of materials directly to the signed artefact it describes. This creates a verifiable chain: the image is signed, the SBOM is attested to the image, and both can be verified at deployment time. CycloneDX and SPDX are the two widely adopted SBOM formats, both of which support the extension points needed for model artefact metadata.
Every pipeline execution must be logged comprehensively.
Every pipeline execution must be logged comprehensively. The log must capture the trigger event, whether commit, manual trigger, or scheduled run; the pipeline configuration version used; every step executed with timestamps and outcomes; every external resource accessed including registry pulls, API calls, and data fetches; the identity of any human who intervened in the pipeline; and the final artefacts produced with their hashes.
This logging requirement flows directly from Article 12's mandate for automatic event recording. The pipeline audit log serves as evidence that compliance gates were executed, that they produced valid results, and that the resulting artefacts are the ones that were actually deployed. Without this audit trail, an organisation cannot demonstrate that its quality management system was consistently applied across every deployment. The audit log also supports incident investigation by providing a complete record of what was built, how it was validated, and who approved its release.
The most dangerous pipeline failure is one that produces a result without running all required gates.
The most dangerous pipeline failure is one that produces a result without running all required gates. The engineering team monitors the pipeline's own health across four dimensions. Execution success rate tracks what proportion of pipeline runs complete successfully. Execution duration reveals whether runs are taking longer, which may indicate infrastructure degradation or resource contention.
Gate pass rates, broken down by gate type, surface emerging issues: a rising failure rate at the fairness gate may signal a problem in training data or feature engineering. Queue depth and wait times show whether pipeline runs are queuing due to resource constraints, causing deployment delays. Pipeline monitoring should verify that every expected gate was executed for each run, that gate results are recorded and non-empty, and that the gate execution sequence matches the pipeline definition. A pipeline run that skips a gate, due to a configuration error, conditional logic bug, or timeout, is flagged by the monitoring system for investigation.
The evidence artefacts produced by the pipeline, including test reports, evaluation metrics, and model cards, should be integrity-checked.
The evidence artefacts produced by the pipeline, including test reports, evaluation metrics, and model cards, should be integrity-checked. The pipeline computes a hash of each artefact at generation time and stores the hash alongside the artefact in the evidence pack. Any subsequent modification to the artefact produces a different hash, enabling detection of tampering or corruption.
This integrity mechanism supports the broader quality management system by ensuring that the evidence used for conformity assessment is the same evidence that was produced during the original pipeline run. Combined with pipeline audit logging, hash verification creates an end-to-end chain of custody from build through to regulatory inspection. If the pipeline fails silently, quality gates are not enforced and non-compliant artefacts may reach production, which is why evidence integrity checking is as important as the gates themselves.
The logging infrastructure must satisfy three design requirements: comprehensiveness, immutability, and retrievability.
The logging infrastructure must satisfy three design requirements: comprehensiveness, immutability, and retrievability. Comprehensiveness means every material event is captured. Immutability means captured events cannot be modified or deleted. Retrievability means events can be efficiently queried and exported for specific regulatory purposes.
Comprehensiveness is achieved through opentelemetry instrumentation. For Article 12 compliance, every inference request records input hash, timestamp, model version, raw output, and confidence score. Every post-processing modification records original output, modified output, and rule applied. Every human oversight action records reviewer identity, timestamp, decision, and override rationale. Every configuration change records parameter changed, old value, new value, and changer identity. Every deployment event records version deployed and approval evidence. Each event carries a correlation ID linking it to the specific inference request and a composite version identifier linking it to the model provenance chain.
Immutability is enforced at the storage layer through WORM (Write Once Read Many) storage.
Immutability is enforced at the storage layer through WORM (Write Once Read Many) storage. AWS S3 Object Lock in compliance mode prevents any user, including the account root user, from deleting or modifying objects for the defined retention period. The retention period should be set to ten years from the date the system is placed on the market, plus a margin for administrative processing. Azure Immutable Blob Storage and Google Cloud Logging retention locks provide equivalent guarantees.
For the highest assurance, a cryptographic hash chain adds tamper evidence: each log entry includes a hash of the preceding entry, so any modification to an earlier entry invalidates the chain from that point forward. This provides a detectable signal even if the storage layer's immutability is somehow circumvented.
For high-throughput systems processing 100,000 inferences per day, with each inference generating a structured log entry of approximately 2KB, the system produces 200MB of log data per day, roughly 73GB per year, accumulating to 730GB over the ten-year retention period. The architecture must accommodate this volume. Apache Kafka provides a high-throughput, durable message queue for collecting log events from the inference service, buffering them, and writing them to long-term storage. Kafka's immutable log retention ensures that events are preserved in order and cannot be modified after writing, and infrastructure planning should budget accordingly.
The regulatory export capability is the mechanism through which the logging infrastructure serves its compliance purpose.
The regulatory export capability is the mechanism through which the logging infrastructure serves its compliance purpose. Logs that exist but cannot be efficiently queried, filtered, and exported are of limited value during an inspection or investigation. The export system should support three query patterns.
Time-range queries retrieve all inference events between date A and date B; this is the most common pattern for routine inspections and post-market monitoring reporting. Trace queries return the complete processing trace for a specific inference request, covering every event from input receipt through to operator decision, and are essential for investigating individual complaints or serious incidents. Cohort queries select all inferences where the model produced a particular outcome for a specific subgroup, enabling analysis of the system's behaviour for specific populations and supporting fairness auditing and fundamental rights impact review.
Elasticsearch or OpenSearch provides the query capability for recent logs, covering the current year and perhaps the preceding year. Older logs archived to S3 Glacier or equivalent cold storage should be queryable through SQL-on-storage services such as AWS Athena or Google BigQuery, though with higher latency.
The Technical SME tests pre-built export scripts quarterly using representative queries, measuring the time to produce the export (the target should be under one hour for routine queries covering up to one quarter of data) and verifying completeness by comparing the export row count against the logging infrastructure's event count. Export format should be configurable: CSV for spreadsheet analysis, JSON for programmatic processing, Parquet for large-volume analytical queries. Each export includes a manifest documenting the query parameters, execution timestamp, event count, and a checksum of the export file.
Logging infrastructure is inherently technical, and there is no manual alternative to automated event capture.
Logging infrastructure is inherently technical, and there is no manual alternative to automated event capture. However, the specific tools can be simplified for organisations that lack enterprise-grade infrastructure.
The minimum tooling comprises application-level logging using a standard logging module writing structured JSON to files, file-based storage with append-only filesystem permissions, and a simple log rotation script that moves completed log files to archival storage. Cloud WORM storage such as S3 Object Lock is available at standard storage cost. OpenTelemetry is open-source and provides the instrumentation layer without licensing cost. These components provide the essential compliance capabilities without requiring enterprise-grade infrastructure or significant licensing expenditure.
The AI Governance Lead must budget the retention infrastructure and ensure it is maintained over the full ten-year period. This is not a trivial operational commitment: storage costs, index maintenance, query infrastructure, and access credential management must survive organisational changes, team restructuring, and cloud account migrations. The Technical SME documents retention configuration in the AISDP, tests it annually to confirm that the team can actually retrieve a log entry from three years ago, and includes it in the disaster recovery plan. The annual retrieval test is a critical validation: without it, organisations may discover that their archival storage is inaccessible only when a regulatory request arrives.
No. Dependencies should be resolved from trusted, pinned sources, typically a private artefact repository mirroring approved versions, to eliminate supply chain attack vectors.
There is no manual alternative to automated event capture. However, the minimum tooling can be simplified to application-level structured JSON logging with append-only filesystem permissions and cloud WORM storage.
Quarterly, using representative queries that measure time to produce the export and verify completeness by comparing export row counts against the logging infrastructure's event counts.
Every component, dependency, and library in the build artefact, extended to include model artefacts and their provenance for complete supply chain transparency.
Trigger events, pipeline configuration versions, every step with timestamps and outcomes, external resources accessed, human interventions, and final artefact hashes.
By verifying every expected gate was executed, results are non-empty, and the execution sequence matches the pipeline definition, flagging any skipped gates.
Time-range queries for routine inspections, trace queries for specific inference investigations, and cohort queries for fairness auditing across subgroups.
A regulatory access profile should be pre-configured as an IAM role granting read-only access to all compliance-relevant systems: the logging infrastructure, monitoring dashboards, evidence repository, model registry, and AISDP documentation. The Conformity Assessment Coordinator creates and tests this profile in advance, not assembled during an inspection. Annual inspection simulation exercises should include activating the regulatory access profile and verifying that it provides the necessary access without exposing sensitive operational data such as proprietary source code or commercial contracts.