Can an organisation use managed AI services and still be compliant?

Yes, but the AISDP must document which managed services are used, what data flows through them, the provider's data handling practices, SLA commitments, and fallback strategies if the service becomes unavailable.

What happens if model optimisation breaches a declared performance threshold?

If quantisation, distillation, or pruning causes any threshold breach, the optimisation configuration must be adjusted or the optimisation abandoned. A breach may also trigger the substantial modification assessment process.

How should edge devices handle logging when offline?

Local buffering with deferred upload is the standard approach. The AISDP must document the maximum local hold time, behaviour when storage is exhausted, and log integrity protections during disconnected operation.

Can an organisation use managed AI services and still be compliant?

Yes, but the AISDP must document which managed services are used, what data flows through them, the provider's data handling practices, SLA commitments, and fallback strategies if the service becomes unavailable.

What happens if model optimisation breaches a declared performance threshold?

If quantisation, distillation, or pruning causes any threshold breach, the optimisation configuration must be adjusted or the optimisation abandoned. A breach may also trigger the substantial modification assessment process.

How should edge devices handle logging when offline?

Local buffering with deferred upload is the standard approach. The AISDP must document the maximum local hold time, behaviour when storage is exhausted, and log integrity protections during disconnected operation.

High-risk AI systems under the EU AI Act must document their infrastructure environment as a compliance-critical record. Article 15 and Annex IV mandate that the AI System Description Package specifies cloud providers, deployment regions, managed service dependencies, and resource allocations. Containerisation provides immutable, versioned deployment environments where the image tested during conformity assessment is exactly what runs in production. Inference infrastructure documentation must cover serving frameworks, latency percentiles, scaling behaviour, and the traceability implications of synchronous, asynchronous, and streaming serving patterns. Model optimisation techniques including quantisation, distillation, and pruning each carry compliance risks requiring independent evaluation against declared performance and fairness thresholds. Disaster recovery plans must be compliance-aware, restoring not just operational state but the specific model version, configuration, and data pipelines documented in the AISDP. Edge deployment introduces distinct challenges: over-the-air updates require version verification and staged rollout, logging in disconnected environments requires local buffering with deferred upload, and physical security measures must address model extraction and sensor tampering threats. Multi-region deployment creates data sovereignty obligations intersecting GDPR transfer restrictions, requiring region-lock policies, cross-region inference consistency validation, and transfer impact assessments for cloud providers subject to non-EU government access laws.

What does the EU AI Act require for infrastructure documentation?

Regulatory Requirement

Article 15 and Annex IV require documentation of the hardware and software environment in which an AI system operates.

Article 15 and Annex IV require documentation of the hardware and software environment in which an AI system operates. This obligation extends beyond a simple list of cloud services to a compliance-critical record demonstrating the system's operational context, resilience characteristics, and security posture. The aisdp must specify the cloud provider, deployment region, specific services used, and resource allocations for inference workloads, including GPU or TPU specifications where applicable. For high-risk systems processing personal data, the deployment region must be within the EU/EEA given GDPR data transfer restrictions and the practical expectations of national competent authorities. Cloud provider data processing agreements must be in place and referenced in the AISDP, confirming that the provider processes data only on documented instructions and that appropriate technical and organisational security measures are applied.

How should managed AI service dependencies be documented?

Regulatory Requirement

Organisations using managed AI services for model training, serving, or monitoring must document and risk-assess each provider dependency.

Organisations using managed AI services for model training, serving, or monitoring must document and risk-assess each provider dependency. The AISDP should specify which managed services are used and for which functions, what data flows through each service, the provider's data handling practices (including whether customer data is used for service improvement), availability and latency SLAs, and the fallback strategy if the service becomes unavailable. System Architecture covers the broader architectural context within which these dependencies sit.

For systems using third-party model APIs such as foundation model providers, the AISDP must additionally document the API provider's model versioning policy, data retention and usage policies, latency and throughput characteristics, and contractual commitments regarding model behaviour stability. API dependency documentation should capture, for each external API, what data flows to and from it, the API's versioning and change notification policies, data handling and retention practices, SLA commitments, and fallback behaviour if the API becomes unavailable.

How does containerisation support compliance?

Engineering Approach

Containerisation provides the reproducible, versioned deployment environment that compliance requires.

Containerisation provides the reproducible, versioned deployment environment that compliance requires. A container image is immutable once built, capturing the exact operating system, libraries, framework versions, and application code that constitute the runtime environment. This immutability means the container image tested during conformity assessment is exactly the image that runs in production. The AISDP should capture the container image build process, the container registry (a private registry with access controls and image signing), orchestration configuration, and resource limits and scaling policies. Each container image is tagged with the corresponding code and model version and stored in a private registry with access logging.

The supply chain risk in containerisation is the base image. A container built from a tagged base image inherits whatever is in that image at build time; if the base image is updated, subsequent builds produce different containers. The mitigation is to pin the base image to a specific digest (a SHA-256 hash), scan the built image for vulnerabilities, sign it with Docker Content Trust or Sigstore cosign, and store it in a private registry with access controls and image scanning enabled. The CI pipeline should fail if the container scan reveals critical or high-severity vulnerabilities.

GitOps deployment extends version control to the deployment process. The desired state of the production environment is declared in a Git repository. The deployment controller continuously reconciles the live cluster state against the Git state, ensuring that the deployed system matches the declared configuration. Every deployment change is a Git commit, providing an immutable audit trail of what was deployed, when, by whom, and through which approval process. This audit trail feeds directly into AISDP record-keeping and change history modules.

What infrastructure supports inference at scale?

Engineering Approach

The inference infrastructure directly affects the system's performance characteristics documented in the AISDP.

The inference infrastructure directly affects the system's performance characteristics documented in the AISDP. Organisations must specify the inference hardware, model serving framework, batching strategy, expected and measured inference latency at p50, p95, and p99 percentiles, throughput capacity and scaling behaviour, and warm-up and cold-start characteristics. For systems using GPU inference, the GPU type, driver version, and CUDA/cuDNN version must be documented, as these affect model behaviour reproducibility. A model producing different outputs on different GPU architectures due to floating-point precision differences is a traceability concern.

The inference infrastructure must scale to meet demand without degrading the declared performance characteristics. Post-Market Monitoring addresses the ongoing monitoring that validates these claims in production. The AISDP must document the scaling policy (autoscaling metric thresholds and replica counts), the expected scaling latency, and the behaviour during scaling events. For high-risk systems where every inference affects an individual's rights, request dropping is generally unacceptable; the architecture must either queue requests with a bounded wait time or maintain sufficient baseline capacity.

Latency measurement requires load testing under realistic conditions, not just average-case testing. The AISDP should report percentile latencies: p50 (the median request), p95 (the typical worst case), and p99 (the extreme tail), measured under the expected production load rather than an idle system. Load testing should simulate the expected request rate, request size distribution, and concurrent user count. Test results, including the exact test configuration and load profile, are retained as evidence.

Which model serving patterns have compliance implications?

Engineering Approach

The choice of serving pattern determines how traceability and logging are implemented.

The choice of serving pattern determines how traceability and logging are implemented. Synchronous serving (request-response) produces an immediate output for each input, simplifying logging and traceability with a one-to-one correlation. Asynchronous serving (queue-based processing) decouples the request from the response; the inference may be processed by a different replica, and a request identifier is needed to correlate the response back to the request. Streaming inference, common for generative models, produces partial outputs over time, requiring the logging infrastructure to capture the complete output rather than just the first token.

For each pattern, the AISDP must document the traceability mechanism ensuring every request can be matched to its response and every inference attributed to the model version that produced it. Compliance-relevant features in a serving framework include model version management (serving a specific pinned version and switching through controlled deployment events), request logging for Article 12 compliance, batching configuration, and health and readiness probes enabling the orchestration layer to detect and replace failed instances.

How should inference cost optimisation be documented?

Engineering Approach

Organisations face pressure to optimise inference costs through techniques such as model quantisation, model distillation, and hardware right-sizing.

Organisations face pressure to optimise inference costs through techniques such as model quantisation, model distillation, and hardware right-sizing. Each technique has potential compliance implications that must be explicitly documented. Quantisation reduces parameter precision and may alter the model's output distribution, particularly for edge cases and minority subgroups. Distillation produces a different model that must be independently evaluated against the AISDP's declared performance and fairness thresholds. Hardware changes may affect numerical precision.

The mitigation for quantisation is to re-run the full evaluation suite on the quantised model, comparing results against declared thresholds. If any threshold is breached, the quantisation configuration is adjusted or the optimisation abandoned. For distillation, the student model is not a compressed version of the teacher but a new model requiring independent evaluation, including the full fairness evaluation suite. Pruning removes neurons, filters, or attention heads that contribute least to the model's predictions. Structured pruning (removing entire neurons or filters) is simpler and more compatible with standard inference frameworks; unstructured pruning (zeroing individual weights) can achieve higher compression ratios but requires sparse-computation support. Pruning may disproportionately affect performance on underrepresented subgroups because the pruned neurons may capture minority-subgroup patterns. For all three techniques, the AISDP must document what technique was applied, the specific configuration (such as quantisation bit width, distillation architecture, or pruning ratio), the motivation (whether cost reduction, latency requirement, or edge deployment constraint), the evaluation results for the optimised model (distinct from the original), and the comparison against declared thresholds. If any threshold is breached, the optimisation triggers the substantial modification assessment process.

What are the disaster recovery requirements for high-risk AI?

Regulatory Requirement

High-risk AI systems must be resilient to infrastructure failures, and the recovery target must be compliance-aware.

High-risk AI systems must be resilient to infrastructure failures, and the recovery target must be compliance-aware. A standard DR plan restores service to an operational state, but an AI system DR plan must restore it to a known, validated, compliant state: serving a specific model version with specific configuration, using specific data pipelines, and producing outputs consistent with the documented AISDP. The AISDP must capture the recovery point objective (RPO) and recovery time objective (RTO), backup strategy for model artefacts, configuration, and critical data, the failover architecture, the disaster recovery testing schedule and results, and degraded-mode behaviour.

For the inference log, RPO should be zero or near-zero because lost log entries create gaps in the Article 12 audit trail. Backup scope must include model artefacts (the specific version in production), configuration (version-controlled inference parameters and threshold values), data pipeline state, infrastructure configuration from the IaC repository, logging state, and monitoring configuration.

For systems where the AI component is safety-critical, the failsafe behaviour must be explicitly documented: when the AI system fails, what default behaviour takes over? For a recruitment screening system, the failsafe might be routing all applications to human review. For a medical diagnostic system, the failsafe might be displaying a warning that the AI assessment is unavailable.

Failover testing should verify three things: that the system fails over within the declared RTO, that the recovered system serves the correct model version with correct configuration, and that the recovered system's outputs are consistent with the primary system's for reference inputs. This third check is the compliance-specific addition, confirming that the recovery process did not silently alter the system's behaviour. DR tests should be run quarterly at minimum, with each test recording the failure scenario, time to detect and recover, whether RTO and RPO were met, and whether the recovered system passed the compliance consistency check.

How does edge deployment change compliance requirements?

Engineering Approach

Systems deployed at the edge, in air-gapped environments, or embedded in physical products face distinct infrastructure documentation requirements.

Systems deployed at the edge, in air-gapped environments, or embedded in physical products face distinct infrastructure documentation requirements. The AISDP should specify the target hardware platform, model optimisation techniques and their impact on accuracy and fairness, the update mechanism, connectivity requirements, and logging and monitoring constraints. Edge devices typically have constrained compute, and the model must be optimised for the target hardware using framework-specific compilation tools. The optimised model must be independently evaluated against declared performance and fairness thresholds for the specific target hardware, because characteristics can differ between cloud and edge versions.

Over-the-air (OTA) update mechanisms must support four capabilities: version verification (confirming integrity and authenticity through cryptographic signatures before applying updates), rollback capability (reverting to the previous version if on-device validation fails), staged rollout (deploying to a subset of devices first with monitoring for anomalies), and update logging (recording which model version each device runs, when updates were applied, and any errors). Monitoring Infrastructure addresses the broader monitoring infrastructure that aggregates these distributed compliance records. The AISDP must define a maximum acceptable version lag and the technical mechanisms to enforce it.

How is inference logging handled in disconnected environments?

Compensating Controls

Edge devices with intermittent or no connectivity cannot stream logs to a central service in real time, yet Article 12's logging requirement applies regardless of deployment location.

Edge devices with intermittent or no connectivity cannot stream logs to a central service in real time, yet Article 12's logging requirement applies regardless of deployment location. Local buffering with deferred upload is the standard mitigation. A lightweight log forwarder writes entries to a local buffer on encrypted device storage, then forwards buffered logs to the central logging service when connectivity is available.

The AISDP must document the buffering strategy, the maximum time logs may be held locally before upload, and the behaviour when local storage is exhausted (typically, oldest logs are overwritten, creating a compliance gap that must be acknowledged). The architecture must also address log integrity during buffering, protecting stored logs against tampering or corruption, and batch upload with deduplication and ordering to ensure the central monitoring system receives a complete, chronologically correct record. The maximum acceptable monitoring gap and compensating controls during disconnected operation must be documented.

What physical security measures apply to edge AI devices?

Compensating Controls

Edge devices deployed in public or semi-public environments are physically accessible to potential attackers, and the cybersecurity threat model must address physical access threats.

Edge devices deployed in public or semi-public environments are physically accessible to potential attackers, and the cybersecurity threat model must address physical access threats. These threats include model extraction through direct access to device storage, input manipulation through tampering with sensors, and denial of service through device destruction or disconnection. The AISDP must document physical security measures and residual physical security risks.

Secure boot uses the device's TPM to verify the integrity of the boot chain before execution, preventing execution of tampered firmware. Encrypted storage prevents extraction of model artefacts and inference logs from a stolen or compromised device. Hardware-backed key storage, where the TPM stores encryption keys in tamper-resistant hardware, prevents key extraction. These measures are documented in the AISDP's cybersecurity module.

For edge model optimisation, common techniques include quantisation to INT8 or FP16, structured pruning that removes entire neurons or filters, knowledge distillation that trains a compact student model from the full-size teacher, and framework-specific compilation. Each technique alters model behaviour to some degree. The AISDP must document which techniques were applied, the performance and fairness evaluation results for the optimised model (not merely the original), and any subgroups for which the optimisation disproportionately affects accuracy.

What data sovereignty obligations apply to multi-region AI deployment?

Regulatory Requirement

Organisations deploying across multiple EU member states must address data sovereignty requirements arising from national legislation, sector-specific regulation, or competent authority guidance.

Organisations deploying across multiple EU member states must address data sovereignty requirements arising from national legislation, sector-specific regulation, or competent authority guidance. The AISDP must declare where personal data is processed, where model inference occurs, where logs are stored, and where each component has its failover region. The AISDP should map each data category to its residency constraints and document how the infrastructure enforces those constraints. Cybersecurity Documentation addresses the related security documentation requirements.

The data sovereignty analysis must distinguish between training data flows and inference data flows. Training typically occurs in a central location as a one-time architectural decision. Inference occurs wherever the system is deployed, potentially in multiple regions simultaneously, and the data residency must be managed continuously. For systems where inference inputs contain personal data (common for high-risk systems in employment, credit, and healthcare domains), the inference infrastructure must process data within the jurisdiction where the data subject resides, or the organisation must have a valid GDPR data transfer mechanism.

Different member states may impose data residency requirements through national legislation, sector-specific regulation, or competent authority guidance. Health data in some jurisdictions must remain within the member state's borders, and financial data may be subject to sector-specific localisation requirements. Data sovereignty for logging requires particular attention: if the system processes requests from users across multiple member states, the inference logs contain personal data from each state. Logs must be stored in a region satisfying GDPR requirements for all data subjects, and the logging infrastructure must prevent log data from flowing outside the EU, including to third-party logging or analytics services that may process data in non-EU jurisdictions.

How are region-lock policies enforced at the infrastructure level?

Engineering Approach

Cloud provider organisation policies can prevent creating resources in regions outside a defined allowlist.

Cloud provider organisation policies can prevent creating resources in regions outside a defined allowlist. An EU-only deployment would restrict resources to EU regions, applied at the organisational level so that new services are automatically constrained. This policy should also prevent data replication to non-EU regions, including automatic features like cross-region backup or CDN caching that may silently distribute data globally.

Cloud services may have internal data flows that cross regional boundaries even when primary resources are region-locked. Managed ML services may use global control planes, logging services may replicate to a central location, and support access may be routed through global teams. Each managed service used by the AI system should be evaluated for its data residency characteristics, and any cross-regional data flow documented in the AISDP and assessed under the GDPR's transfer impact assessment framework. Even when data is stored in an EU region, if the cloud provider is subject to non-EU government access laws, a transfer impact assessment is required, documenting the provider's supplementary measures and the organisation's residual risk assessment.

How is cross-region inference consistency validated?

Engineering Approach

Systems deployed across multiple regions must produce consistent outputs regardless of which region processes the request.

Systems deployed across multiple regions must produce consistent outputs regardless of which region processes the request. Model artefacts must be identical across regions, ensured by deploying from the same model registry entry (same content hash) to all regions. Consistency can fail subtly if regions use different hardware, different software versions, or different configurations.

A consistency test suite should send reference inputs to each region's inference endpoint and compare outputs. Any divergence triggers investigation, and if divergence exceeds a significance threshold, deployment to the divergent region is blocked until resolved. Cross-border model deployment also raises whether the model itself constitutes a data transfer; if trained on personal data from one jurisdiction and deployed in another, regulators may consider the learned parameters to be derived personal data. Prudent organisations document their position and supporting analysis, and where the model may constitute personal data, appropriate transfer mechanisms should be in place.

What are the procedural alternatives for infrastructure documentation?

Compensating Controls

Infrastructure-as-code is the strongest approach because the infrastructure definition is simultaneously the documentation.

Infrastructure-as-code is the strongest approach because the infrastructure definition is simultaneously the documentation. A Terraform configuration defining cloud resources, networking, access controls, and deployment targets is a machine-readable, version-controlled, auditable record. When infrastructure changes, the configuration changes, and the Git history records what changed, when, and by whom. For AISDP purposes, the IaC repository is referenced as evidence; the current main branch represents the current infrastructure, and historical commits represent previous states.

The choice between IaC tools depends on the organisation's cloud provider and existing practices. Terraform is provider-agnostic and the most widely adopted. Pulumi offers the same capability with general-purpose programming languages in place of Terraform's domain-specific syntax. Cloud-native tools are native to their respective providers and integrate more tightly with provider-specific features but lock the organisation to a single cloud. For multi-cloud deployments, provider-agnostic tools are preferable.

For organisations without IaC capability, manual documentation of the infrastructure is acceptable but error-prone. Without containers, the organisation must document the exact operating system version, library versions, system configuration, and environment variables, then manually reproduce them on each deployment target. The DR plan is a document and recovery procedures can be manual, but backup creation and restoration require tooling. The procedural elements encompass the DR plan, testing schedule, and decision-making authority. For edge deployment, manual version management is feasible for small fleets (under 20 devices) but impractical at scale, requiring a version tracking spreadsheet, manual update checklists covering artefact download, hash verification, deployment, version confirmation, and update logging, plus periodic manual log retrieval with chain-of-custody documentation. The data sovereignty assessment and transfer impact assessment are procedural activities, while enforcement of region-lock policies requires cloud IAM configuration.

Infrastructure for Compliant AI: Cloud, Containers, Edge, Multi-Region

Written by

What does the EU AI Act require for infrastructure documentation?

How should managed AI service dependencies be documented?

How does containerisation support compliance?

What infrastructure supports inference at scale?

Which model serving patterns have compliance implications?

How should inference cost optimisation be documented?

What are the disaster recovery requirements for high-risk AI?

How does edge deployment change compliance requirements?

How is inference logging handled in disconnected environments?

What physical security measures apply to edge AI devices?

What data sovereignty obligations apply to multi-region AI deployment?

How are region-lock policies enforced at the infrastructure level?

How is cross-region inference consistency validated?

What are the procedural alternatives for infrastructure documentation?

Frequently Asked Questions

Related Pages

In This Section

Build compliance into your pipeline