We use cookies to improve your experience and analyse site traffic.
Article 15 of the EU AI Act requires providers of high-risk AI systems to address cybersecurity risks, including threats specific to AI that traditional frameworks miss. This page maps the OWASP LLM Top 10 and MITRE ATLAS taxonomy to practical engineering controls and AISDP documentation requirements.
AI systems are vulnerable to a category of attacks that conventional cybersecurity frameworks do not address: threats that exploit learning and inference processes rather than software vulnerabilities alone.
AI systems are vulnerable to a category of attacks that conventional cybersecurity frameworks do not address: threats that exploit learning and inference processes rather than software vulnerabilities alone. A model can be poisoned through its training data, evaded through adversarial inputs, extracted through query access, and manipulated through prompt injection. The owasp llm top 10 provides a structured taxonomy for identifying these AI-specific threats, while mitre atlas catalogues real-world adversarial techniques against machine learning systems in a matrix analogous to MITRE ATT&CK for traditional cyber threats.
The OWASP taxonomy covers prompt injection, insecure output handling, training data poisoning, model denial of service, supply chain vulnerabilities, sensitive information disclosure, insecure plugin design, excessive agency, overreliance, and model theft. Beyond these ten categories, AI systems also face adversarial example attacks where imperceptibly perturbed inputs cause high-confidence incorrect outputs, model inversion attacks that reconstruct training data from outputs, and federated learning risks including poisoned gradients and aggregation manipulation.
The EU AI Act requires providers of high risk ai to address cybersecurity risks proportionate to the system's risk level. Cybersecurity and Resilience establishes the broader cybersecurity framework within which these AI-specific threats must be managed. Each threat category maps to specific AISDP modules, with Module 9 capturing the primary cybersecurity controls and Modules 1, 3, 4, 5, and 7 receiving cross-referenced documentation for their respective domains.
Prompt injection occurs when attackers craft inputs that cause a language model to deviate from its intended behaviour, ignore instructions, or execute actions beyond its authorised scope.
Prompt injection occurs when attackers craft inputs that cause a language model to deviate from its intended behaviour, ignore instructions, or execute actions beyond its authorised scope. Direct injection supplies malicious input to the model interface, while indirect injection plants malicious content in data sources the model consults during processing.
Input sanitisation should filter or escape known injection patterns before they reach the model. Output validation verifies that responses fall within the expected output space. Privilege separation ensures the model component cannot access resources or execute actions beyond its documented scope.
Instruction anchoring uses system prompts resistant to override, and input-output monitoring flags anomalous patterns that may indicate injection attempts. These controls should be layered: no single defence is sufficient against a determined attacker, but the combination of sanitisation, validation, separation, and monitoring creates defence in depth.
For agentic AI systems where the model can invoke tools or take actions, prompt injection is closely related to the insecure plugin design threat. An attacker who successfully injects a prompt may cause the model to invoke tools beyond its authorised scope, making privilege separation especially critical. AISDP Module 9 captures the prompt injection threat, the controls in place, the adversarial prompt injection testing performed, and the residual risk assessment.
Model outputs passed to downstream systems without validation can trigger secondary vulnerabilities including cross-site scripting when rendered in web interfaces, SQL injection when incorporated into database queries, and command injection when passed to system shells.
Model outputs passed to downstream systems without validation can trigger secondary vulnerabilities including cross-site scripting when rendered in web interfaces, SQL injection when incorporated into database queries, and command injection when passed to system shells. The post-processing layer must treat all model outputs as untrusted input from the perspective of downstream components.
Output encoding, parameterised queries, and sandboxed execution environments prevent output-based injection chains. For high-risk systems, a dedicated output validation layer verifies every output against the expected schema before downstream delivery. Output Validation and Monitoring provides the detailed engineering patterns for this validation layer.
Guardrails AI enables structured output validation where the expected format is defined as a schema and the model's response is parsed, validated, and re-prompted if non-conforming. For web-rendered outputs, OWASP ESAPI provides encoding that neutralises injection payloads. The critical implementation principle is that output validation is enforced at the infrastructure level as a dedicated middleware on the inference output path, not within the model's own code, and cannot be bypassed.
If validation fails, the output is not delivered to the downstream consumer; a safe default response is returned and the failure is logged for investigation. For outputs that will be used in database queries, parameterised query construction prevents the model's output from being interpreted as executable SQL.
An attacker who manipulates training data can introduce systematic biases, backdoors, or performance degradation into the trained model.
An attacker who manipulates training data can introduce systematic biases, backdoors, or performance degradation into the trained model. Data poisoning can be targeted, affecting behaviour for specific inputs while leaving general performance intact, or untargeted, degrading overall model quality. The defences rest on two pillars: access control and anomaly detection.
Data provenance tracking through the data lineage infrastructure enables detection of unauthorised modifications. Statistical anomaly detection using isolation forests and distributional tests on training data identifies suspicious records before training begins. Model behaviour testing after each retraining cycle checks for unexpected output changes on sentinel inputs.
Access controls on training data repositories prevent unauthorised modification, with every access event logged in an immutable audit trail. DVC and Delta Lake provide version-controlled data storage where every modification is attributed to a specific user and timestamp. Anomaly detection tools such as Great Expectations and Evidently AI provide a second defence layer, halting the pipeline if distributional changes exceed thresholds.
Sophisticated poisoning attacks may modify only a small fraction of records while keeping overall distributions within normal bounds. For high-risk systems, periodic manual review of a random sample of training records provides human verification that automated checks cannot fully replace. The training data integrity controls belong jointly in AISDP Module 4 and Module 9.
AI inference endpoints are particularly vulnerable to denial of service because individual requests can be computationally expensive, with large transformer models requiring seconds of GPU time per request.
AI inference endpoints are particularly vulnerable to denial of service because individual requests can be computationally expensive, with large transformer models requiring seconds of GPU time per request. An attacker can submit inputs designed to trigger worst-case inference times or flood the serving infrastructure with high-volume requests, degrading service for legitimate users or incurring excessive cloud costs.
Rate limiting is the first defence: enforce a maximum request rate per client identified by API key, IP address, or authenticated identity. Limits should accommodate legitimate peak usage with margin, rejecting excess requests with HTTP 429 responses. Kong, NGINX, and cloud API gateways all support configurable rate limiting.
Inference timeout enforcement is the second defence, setting a maximum execution time per request and terminating any that exceeds it. The timeout should sit above the p99 latency for legitimate requests and below the threshold where a single request materially impacts other users. For neural networks, input complexity analysis can detect and reject inputs designed to trigger pathological computation paths.
Autoscaling with cost caps provides the third layer: the system scales up to handle increased load but will not exceed a defined cost ceiling, preventing sustained attacks from generating unbounded cloud bills. Module 9 records the rate limiting configuration, timeout thresholds, and auto-scaling boundaries, while Module 5 states the expected throughput under normal and adversarial load conditions. covers the load testing methodology that validates these controls.
The ML supply chain has three distinct layers requiring separate controls: software dependencies, model components, and infrastructure.
The ML supply chain has three distinct layers requiring separate controls: software dependencies, model components, and infrastructure. Compromised dependencies, pre-trained models, or third-party services can introduce vulnerabilities that traditional software composition analysis may not detect.
The software dependency layer encompasses ML frameworks such as TensorFlow, PyTorch, and scikit-learn alongside data processing libraries and serving frameworks. Every dependency is pinned to an exact version in a lock file, and dependencies are fetched from a private repository such as JFrog Artifactory or Sonatype Nexus rather than directly from public registries. This prevents typosquatting and dependency confusion attacks while ensuring package availability.
The model component layer covers pre-trained models, tokenisers, embedding models, and other ML artefacts sourced externally. These execute computations and can contain vulnerabilities such as model backdoors or poisoned weights. Sigstore and cosign provide cryptographic signing for model artefacts, and for Hugging Face models, the revision parameter pins to a specific Git commit SHA with independently verifiable content hashes.
The infrastructure layer includes container base images, operating system packages, and cloud service configurations. Container images should be built from pinned, signed base images, and infrastructure configurations scanned for security misconfigurations using IaC scanning tools such as Checkov, tfsec, and KICS before deployment. All cached packages should be scanned for known vulnerabilities with Snyk or Trivy, and packages with critical vulnerabilities rejected. The SBOM generation and review process, vendor security assessment results, and supply chain risk assessment feed into AISDP Module 9.
Models can leak sensitive information from their training data through three distinct mechanisms: memorisation, where the model reproduces verbatim passages including personal information; membership inference, where an attacker determines whether specific data was in the training set; and property inference, where aggregate properties of the training data are deduced.
Models can leak sensitive information from their training data through three distinct mechanisms: memorisation, where the model reproduces verbatim passages including personal information; membership inference, where an attacker determines whether specific data was in the training set; and property inference, where aggregate properties of the training data are deduced. This risk has direct GDPR implications for systems processing personal data.
Differential privacy provides a mathematically rigorous defence by adding calibrated noise to the training process so that model parameters are provably insensitive to any individual training record. TensorFlow Privacy and Opacus implement differentially private stochastic gradient descent, clipping per-example gradients and adding Gaussian noise. The practical challenge is calibrating epsilon: values below 1 provide strong privacy but may significantly degrade accuracy, while values above 10 provide minimal privacy benefit.
Output filtering using Microsoft Presidio or spaCy NER provides a runtime defence for generative models, scanning generated text for personal names, addresses, phone numbers, and other identifiable information before delivery. Presidio uses a mix of pattern matching, NER models, and custom recognisers supporting multiple languages and entity types.
Membership inference testing using ML Privacy Meter evaluates the model's susceptibility by training an attack model on a shadow dataset and measuring its ability to distinguish training members from non-members. If the attack achieves significantly better than random accuracy, the model is leaking membership information. A common acceptable threshold is an attack AUC-ROC below 0.55, meaning only marginally better than chance.
For systems where the AI model interfaces with external tools, APIs, or plugins, insufficient validation of tool usage can lead to unauthorised actions.
For systems where the AI model interfaces with external tools, APIs, or plugins, insufficient validation of tool usage can lead to unauthorised actions. The system must validate tool calls against an allowlist of permitted actions with permitted parameters, require human approval for high-impact actions, and maintain comprehensive logging of all tool invocations. AISDP Module 9 records the tool and plugin inventory and the permission model for each integration, while Module 7 captures which actions require human approval.
Excessive agency occurs when the system is granted more autonomy, permissions, or capabilities than its intended purpose requires, creating unnecessary risk surfaces. The principle of least privilege must be applied to access rights, API permissions, and action capabilities. The system's authorised scope is documented in the AISDP and enforced through technical controls, not merely policy.
Regular access reviews confirm that permissions remain proportionate to the documented purpose. Any gap between technical capabilities and the documented intended purpose represents an excessive agency risk that the AISDP must acknowledge. The system's complete permission inventory and justification for each access right belong in Module 9, while Module 1 defines the intended scope of autonomy.
Overreliance, where users treat AI outputs as authoritative without verification, is addressed primarily through human oversight measures and automation bias countermeasures. The cybersecurity dimension is the technical enforcement of human review: the system must not be configurable to operate without human oversight for high-risk decisions. The controls preventing bypass of human oversight feed into Module 9, including enforcement of mandatory review workflows that prevent operators from bulk-approving recommendations without individual assessment.
Traditional threat modelling identifies threats to software systems such as injection attacks, authentication bypass, and privilege escalation.
Traditional threat modelling identifies threats to software systems such as injection attacks, authentication bypass, and privilege escalation. AI systems face all of these plus threats exploiting learning and inference processes that traditional modelling does not address. The threat model must cover both the traditional software attack surface and the AI-specific attack surface using a combined approach.
MITRE ATLAS provides the taxonomic foundation, cataloguing adversarial techniques across reconnaissance, resource development, initial access, execution, persistence, evasion, and impact categories, each with real-world case studies and documented mitigations. The threat modelling exercise should proceed in four stages: scope the system's attack surface by identifying every point where external input enters or output affects decisions; enumerate threats at each point using the combined STRIDE plus ATLAS taxonomy; assess each threat using likelihood multiplied by impact with impact scored across health and safety, fundamental rights, operational integrity, and reputational dimensions; and define mitigations for each threat above the risk acceptance threshold.
AI-specific threats often score differently from traditional ones. A training data poisoning attack may have low likelihood because it requires training pipeline access, yet catastrophic impact because it can systematically bias every prediction. The Technical SME documents the threat model as a structured artefact using tools such as IriusRisk or OWASP Threat Dragon, version-controlled and reviewed annually at minimum.
The threat model is updated whenever the system's architecture, data sources, or deployment context change materially. It feeds directly into the cybersecurity testing programme: penetration testing and adversarial ML testing should exercise the threats identified. provides the overarching framework within which this threat model operates.
Model extraction attacks involve an adversary repeatedly querying the model, collecting input-output pairs, and training a surrogate that approximates the original's functionality.
Model extraction attacks involve an adversary repeatedly querying the model, collecting input-output pairs, and training a surrogate that approximates the original's functionality. For high-risk AI systems, the consequences extend beyond intellectual property loss: the adversary obtains a model without compliance controls, monitoring, or governance. Rate limiting that caps total query volume per client over daily and weekly windows makes extraction prohibitively slow, with limits enforced per authenticated identity rather than merely per IP address.
Network segmentation restricts which systems can access the model serving endpoint. The inference service should be accessible only through the application layer, not directly from the internet. Kubernetes NetworkPolicies define pod-level access rules, and a service mesh such as Istio or Linkerd adds mutual TLS ensuring every connection is authenticated and encrypted.
Encrypted model storage protects artefacts at rest using a key management service, with key access restricted to the serving infrastructure and named administrators. Model watermarking embeds a detectable signal in the model's behaviour that survives extraction, enabling ownership verification. Backdoor-based watermarking, training the model to produce specific outputs on rare trigger inputs, is the most practical current approach.
For API-accessed systems where the model is served to external deployers, the extraction risk is higher because the deployer has legitimate query access. Contractual controls prohibiting systematic querying for extraction purposes complement the technical controls. The inference logging infrastructure should retain sufficient query detail to detect and evidence extraction attempts if a contractual dispute arises.
Where full technical controls are not feasible, procedural alternatives can partially address AI-specific threats, though with documented limitations.
Where full technical controls are not feasible, procedural alternatives can partially address AI-specific threats, though with documented limitations. For training data integrity, an access policy should document who has read, write, and delete access to training data repositories, with access lists reviewed quarterly. Every modification is logged with the modifier's identity, timestamp, description, and rationale, with the log reviewed before each training run.
A data engineer reviews a random sample of 100 to 500 records before each training run, checking for anomalies, unexpected values, or suspicious patterns. This approach loses automated anomaly detection on every modification, meaning sophisticated poisoning attacks modifying a small number of records may evade manual sampling.
For output review in systems with low output volume, a human reviews every output before delivery for PII, confidential information, or inappropriate content. For higher-volume systems, a random sample is reviewed daily. Before each training run, the DPO Liaison reviews training data for PII that should have been removed. This approach loses mathematical privacy guarantees, automated PII detection at scale, and quantitative membership inference risk assessment.
For model theft prevention, the engineering team issues unique API keys per client with usage volume tracked manually. The Technical SME reviews API access logs weekly for unusual query patterns. Deployer agreements contractually prohibit systematic querying for extraction purposes. The minimum tooling required includes NGINX rate limiting, cloud encryption at rest on standard subscriptions, and Kubernetes NetworkPolicies at no additional cost. Cloud audit logging services such as CloudTrail and Azure Monitor are included in standard cloud subscriptions and should be enabled as a baseline.
The OWASP LLM Top 10 provides a useful starting point, but the threat categories extend to AI systems generally. Evaluate each threat against your specific system architecture and document which threats apply, which do not, and the rationale for exclusion in your AISDP.
NGINX rate limiting, Kubernetes NetworkPolicies, and cloud encryption at rest are available at no additional cost. Cloud audit logging services like CloudTrail and Azure Monitor are included in standard subscriptions. Open-source options include OpenDP, Opacus, and Microsoft Presidio for privacy controls.
Procedural alternatives such as manual log review, output sampling, and quarterly access reviews provide partial mitigation but lose automated detection, mathematical privacy guarantees, and real-time response. Document the specific capabilities lost and the residual risk accepted in AISDP Module 9.
At minimum annually, and whenever the system's architecture, data sources, or deployment context change materially. The threat model should be version-controlled and directly feed into the cybersecurity testing programme.
Data provenance tracking, statistical anomaly detection, sentinel input testing after retraining, access controls on data repositories, and version-controlled storage with immutable audit trails.
Use the combined STRIDE plus ATLAS taxonomy across four stages: scope the attack surface, enumerate threats at each point, assess likelihood times impact, and define mitigations for threats above the acceptance threshold.
Rate limiting per authenticated identity, network segmentation with service mesh mutual TLS, encrypted model storage with key management services, and model watermarking for ownership verification.
Differential privacy adds calibrated noise during training so model parameters are provably insensitive to individual records, with epsilon calibration balancing privacy strength against model accuracy.
This principle is counterintuitive to many teams who trust their own model's outputs. The principle exists because any model can produce unexpected outputs under adversarial conditions, distribution shift, or simple edge cases: an LLM may generate SQL injection syntax, a classifier may return confidence scores outside valid ranges due to numerical overflow, and a generative model may include personally identifiable information memorised from training data. AISDP Module 3 should describe the output validation layer as a distinct architectural component, and Module 9 records the validation controls and testing confirming no downstream injection path exists.
Model inversion attacks use the model's outputs, including confidence scores and probability distributions, to reconstruct information about the training data. In classification models, inversion can recover representative examples of each class. Restricting output granularity is the most effective countermeasure: returning only the top prediction or a coarsened confidence band rather than full probability distributions. Monitoring output patterns for signs of systematic probing, where a consumer submits inputs designed to explore the model's decision boundary, supports early detection.
The chosen epsilon, resulting accuracy trade-off, and rationale are documented in AISDP Module 6, with the disclosure controls recorded jointly in Module 4 and Module 9.
Organisations using federated learning or distributed training face additional threat dimensions. Malicious participants can submit poisoned gradient updates that corrupt the global model, infer information about other participants' data from gradient exchanges, or exploit the aggregation protocol. Secure aggregation protocols prevent the central coordinator from seeing individual gradient updates, differential privacy on gradients limits information leakage, and Byzantine-robust aggregation detects anomalous updates. These architectures require documentation of the security controls and trust model in both Module 5 and Module 9 of the AISDP.
Adversarial examples are inputs crafted with imperceptible perturbations that cause incorrect outputs with high confidence, affecting image classification, speech recognition, tabular data models, and other AI system types. Defences include adversarial training that includes adversarial examples in the training data to improve robustness, input validation detecting out-of-distribution inputs, and ensemble methods that are more robust to adversarial perturbations than single models. Regular adversarial testing should be integrated into the CI pipeline.
The adversarial robustness evaluation methodology, attack types tested, measured robustness percentage, and residual risk for attack types where full robustness cannot be achieved all feed into AISDP Module 9, with Module 5 including adversarial robustness metrics alongside standard accuracy metrics.