How can organisations defend against prompt injection attacks?

Defences include input sanitisation, output validation, privilege separation ensuring the model cannot access resources beyond its scope, instruction anchoring with override-resistant system prompts, and input-output monitoring for anomalous patterns.

What controls protect against training data poisoning?

Data provenance tracking, statistical anomaly detection, sentinel input testing after retraining, access controls on data repositories, and version-controlled storage with immutable audit trails.

What defences exist against model theft and extraction?

Rate limiting per authenticated identity, network segmentation with service mesh mutual TLS, encrypted model storage with key management services, and model watermarking for ownership verification.

How does differential privacy protect against information disclosure?

Differential privacy adds calibrated noise during training so model parameters are provably insensitive to individual records, with epsilon calibration balancing privacy strength against model accuracy.

Do we need to address all ten OWASP LLM threats even if our system is not an LLM?

The OWASP LLM Top 10 provides a useful starting point, but the threat categories extend to AI systems generally. Evaluate each threat against your specific system architecture and document which threats apply, which do not, and the rationale for exclusion in your AISDP.

What is the minimum tooling required for AI-specific cybersecurity if budget is limited?

NGINX rate limiting, Kubernetes NetworkPolicies, and cloud encryption at rest are available at no additional cost. Cloud audit logging services like CloudTrail and Azure Monitor are included in standard subscriptions. Open-source options include OpenDP, Opacus, and Microsoft Presidio for privacy controls.

How do procedural alternatives compare to technical controls for AI threats?

Procedural alternatives such as manual log review, output sampling, and quarterly access reviews provide partial mitigation but lose automated detection, mathematical privacy guarantees, and real-time response. Document the specific capabilities lost and the residual risk accepted in AISDP Module 9.

How often should the AI-specific threat model be reviewed?

At minimum annually, and whenever the system's architecture, data sources, or deployment context change materially. The threat model should be version-controlled and directly feed into the cybersecurity testing programme.

Do we need to address all ten OWASP LLM threats even if our system is not an LLM?

The OWASP LLM Top 10 provides a useful starting point, but the threat categories extend to AI systems generally. Evaluate each threat against your specific system architecture and document which threats apply, which do not, and the rationale for exclusion in your AISDP.

What is the minimum tooling required for AI-specific cybersecurity if budget is limited?

NGINX rate limiting, Kubernetes NetworkPolicies, and cloud encryption at rest are available at no additional cost. Cloud audit logging services like CloudTrail and Azure Monitor are included in standard subscriptions. Open-source options include OpenDP, Opacus, and Microsoft Presidio for privacy controls.

How do procedural alternatives compare to technical controls for AI threats?

Procedural alternatives such as manual log review, output sampling, and quarterly access reviews provide partial mitigation but lose automated detection, mathematical privacy guarantees, and real-time response. Document the specific capabilities lost and the residual risk accepted in AISDP Module 9.

How often should the AI-specific threat model be reviewed?

At minimum annually, and whenever the system's architecture, data sources, or deployment context change materially. The threat model should be version-controlled and directly feed into the cybersecurity testing programme.

AI-Specific Threats and the OWASP LLM Top 10

Written by

Michael Clark

Chief Executive Officer, Standard Intelligence

Founder and CEO of Standard Intelligence. Author of the Practitioners Implementation Guide series for EU AI Act compliance.

Martin Dean

Chief Technology Officer, Standard Intelligence

CTO of Standard Intelligence. Leads platform engineering and contributes to the PIG series technical content.

Article 15 of the EU AI Act requires providers of high-risk AI systems to address cybersecurity risks, including threats specific to AI that traditional frameworks miss. This page maps the OWASP LLM Top 10 and MITRE ATLAS taxonomy to practical engineering controls and AISDP documentation requirements.

Abstract

Read abstract

AI systems face a distinct category of threats that traditional cybersecurity frameworks do not address. Attackers can poison training data, inject malicious prompts, extract model parameters, craft adversarial inputs, and exploit insecure tool integrations. The OWASP Top 10 for LLM Applications provides a structured taxonomy covering prompt injection, insecure output handling, training data poisoning, model denial of service, supply chain vulnerabilities, sensitive information disclosure, insecure plugin design, excessive agency, overreliance, and model theft. MITRE ATLAS extends this with a comprehensive adversarial technique catalogue analogous to ATT&CK. For each threat category, engineering controls exist: input sanitisation and privilege separation for prompt injection; output validation middleware for insecure output handling; data provenance tracking and anomaly detection for data poisoning; rate limiting, timeouts, and cost-capped autoscaling for denial of service; SBOMs, dependency pinning, and cryptographic signing for supply chain risks; differential privacy and membership inference testing for information disclosure. Organisations must build threat models combining STRIDE with ATLAS, scoring each threat by likelihood and impact across health, safety, fundamental rights, and operational dimensions. Where full technical controls are not feasible, documented procedural alternatives with explicit gap analysis provide partial mitigation. All controls, testing results, and residual risk assessments feed into the AI System Description Package.

Why do AI systems face threats that traditional cybersecurity misses?

Regulatory Requirement

AI systems are vulnerable to a category of attacks that conventional cybersecurity frameworks do not address: threats that exploit learning and inference processes rather than software vulnerabilities alone. A model can be poisoned through its training data, evaded through adversarial inputs, extracted through query access, and manipulated through prompt injection. The owasp llm top 10 provides a structured taxonomy for identifying these AI-specific threats, while mitre atlas catalogues real-world adversarial techniques against machine learning systems in a matrix analogous to MITRE ATT&CK for traditional cyber threats.

The OWASP taxonomy covers prompt injection, insecure output handling, training data poisoning, model denial of service, supply chain vulnerabilities, sensitive information disclosure, insecure plugin design, excessive agency, overreliance, and model theft. Beyond these ten categories, AI systems also face adversarial example attacks where imperceptibly perturbed inputs cause high-confidence incorrect outputs, model inversion attacks that reconstruct training data from outputs, and federated learning risks including poisoned gradients and aggregation manipulation.

The EU AI Act requires providers of high risk ai to address cybersecurity risks proportionate to the system's risk level. Cybersecurity and Resilience establishes the broader cybersecurity framework within which these AI-specific threats must be managed. Each threat category maps to specific AISDP modules, with Module 9 capturing the primary cybersecurity controls and Modules 1, 3, 4, 5, and 7 receiving cross-referenced documentation for their respective domains.

How does prompt injection compromise AI systems?

Engineering Approach

Prompt injection occurs when attackers craft inputs that cause a language model to deviate from its intended behaviour, ignore instructions, or execute actions beyond its authorised scope.

Prompt injection occurs when attackers craft inputs that cause a language model to deviate from its intended behaviour, ignore instructions, or execute actions beyond its authorised scope. Direct injection supplies malicious input to the model interface, while indirect injection plants malicious content in data sources the model consults during processing.

Input sanitisation should filter or escape known injection patterns before they reach the model. Output validation verifies that responses fall within the expected output space. Privilege separation ensures the model component cannot access resources or execute actions beyond its documented scope.

Instruction anchoring uses system prompts resistant to override, and input-output monitoring flags anomalous patterns that may indicate injection attempts. These controls should be layered: no single defence is sufficient against a determined attacker, but the combination of sanitisation, validation, separation, and monitoring creates defence in depth.

For agentic AI systems where the model can invoke tools or take actions, prompt injection is closely related to the insecure plugin design threat. An attacker who successfully injects a prompt may cause the model to invoke tools beyond its authorised scope, making privilege separation especially critical. AISDP Module 9 captures the prompt injection threat, the controls in place, the adversarial prompt injection testing performed, and the residual risk assessment.

What risks arise from insecure output handling?

Engineering Approach

Model outputs passed to downstream systems without validation can trigger secondary vulnerabilities including cross-site scripting when rendered in web interfaces, SQL injection when incorporated into database queries, and command injection when passed to system shells. The post-processing layer must treat all model outputs as untrusted input from the perspective of downstream components.

Output encoding, parameterised queries, and sandboxed execution environments prevent output-based injection chains. For high-risk systems, a dedicated output validation layer verifies every output against the expected schema before downstream delivery. Output Validation and Monitoring provides the detailed engineering patterns for this validation layer.

Guardrails AI enables structured output validation where the expected format is defined as a schema and the model's response is parsed, validated, and re-prompted if non-conforming. For web-rendered outputs, OWASP ESAPI provides encoding that neutralises injection payloads. The critical implementation principle is that output validation is enforced at the infrastructure level as a dedicated middleware on the inference output path, not within the model's own code, and cannot be bypassed.

If validation fails, the output is not delivered to the downstream consumer; a safe default response is returned and the failure is logged for investigation. For outputs that will be used in database queries, parameterised query construction prevents the model's output from being interpreted as executable SQL.

Training data poisoning and integrity controls

Engineering Approach

An attacker who manipulates training data can introduce systematic biases, backdoors, or performance degradation into the trained model.

An attacker who manipulates training data can introduce systematic biases, backdoors, or performance degradation into the trained model. Data poisoning can be targeted, affecting behaviour for specific inputs while leaving general performance intact, or untargeted, degrading overall model quality. The defences rest on two pillars: access control and anomaly detection.

Data provenance tracking through the data lineage infrastructure enables detection of unauthorised modifications. Statistical anomaly detection using isolation forests and distributional tests on training data identifies suspicious records before training begins. Model behaviour testing after each retraining cycle checks for unexpected output changes on sentinel inputs.

Access controls on training data repositories prevent unauthorised modification, with every access event logged in an immutable audit trail. DVC and Delta Lake provide version-controlled data storage where every modification is attributed to a specific user and timestamp. Anomaly detection tools such as Great Expectations and Evidently AI provide a second defence layer, halting the pipeline if distributional changes exceed thresholds.

Sophisticated poisoning attacks may modify only a small fraction of records while keeping overall distributions within normal bounds. For high-risk systems, periodic manual review of a random sample of training records provides human verification that automated checks cannot fully replace. The training data integrity controls belong jointly in AISDP Module 4 and Module 9.

How should organisations defend against model denial of service?

Engineering Approach

AI inference endpoints are particularly vulnerable to denial of service because individual requests can be computationally expensive, with large transformer models requiring seconds of GPU time per request. An attacker can submit inputs designed to trigger worst-case inference times or flood the serving infrastructure with high-volume requests, degrading service for legitimate users or incurring excessive cloud costs.

Rate limiting is the first defence: enforce a maximum request rate per client identified by API key, IP address, or authenticated identity. Limits should accommodate legitimate peak usage with margin, rejecting excess requests with HTTP 429 responses. Kong, NGINX, and cloud API gateways all support configurable rate limiting.

Inference timeout enforcement is the second defence, setting a maximum execution time per request and terminating any that exceeds it. The timeout should sit above the p99 latency for legitimate requests and below the threshold where a single request materially impacts other users. For neural networks, input complexity analysis can detect and reject inputs designed to trigger pathological computation paths.

Autoscaling with cost caps provides the third layer: the system scales up to handle increased load but will not exceed a defined cost ceiling, preventing sustained attacks from generating unbounded cloud bills. Module 9 records the rate limiting configuration, timeout thresholds, and auto-scaling boundaries, while Module 5 states the expected throughput under normal and adversarial load conditions. covers the load testing methodology that validates these controls.

Supply chain vulnerabilities in ML systems

Engineering Approach

The ML supply chain has three distinct layers requiring separate controls: software dependencies, model components, and infrastructure.

The ML supply chain has three distinct layers requiring separate controls: software dependencies, model components, and infrastructure. Compromised dependencies, pre-trained models, or third-party services can introduce vulnerabilities that traditional software composition analysis may not detect.

The software dependency layer encompasses ML frameworks such as TensorFlow, PyTorch, and scikit-learn alongside data processing libraries and serving frameworks. Every dependency is pinned to an exact version in a lock file, and dependencies are fetched from a private repository such as JFrog Artifactory or Sonatype Nexus rather than directly from public registries. This prevents typosquatting and dependency confusion attacks while ensuring package availability.

The model component layer covers pre-trained models, tokenisers, embedding models, and other ML artefacts sourced externally. These execute computations and can contain vulnerabilities such as model backdoors or poisoned weights. Sigstore and cosign provide cryptographic signing for model artefacts, and for Hugging Face models, the revision parameter pins to a specific Git commit SHA with independently verifiable content hashes.

The infrastructure layer includes container base images, operating system packages, and cloud service configurations. Container images should be built from pinned, signed base images, and infrastructure configurations scanned for security misconfigurations using IaC scanning tools such as Checkov, tfsec, and KICS before deployment. All cached packages should be scanned for known vulnerabilities with Snyk or Trivy, and packages with critical vulnerabilities rejected. The SBOM generation and review process, vendor security assessment results, and supply chain risk assessment feed into AISDP Module 9.

Protecting against sensitive information disclosure

Engineering Approach

Models can leak sensitive information from their training data through three distinct mechanisms: memorisation, where the model reproduces verbatim passages including personal information; membership inference, where an attacker determines whether specific data was in the training set; and property inference, where aggregate properties of the training data are deduced.

Differential privacy provides a mathematically rigorous defence by adding calibrated noise to the training process so that model parameters are provably insensitive to any individual training record. TensorFlow Privacy and Opacus implement differentially private stochastic gradient descent, clipping per-example gradients and adding Gaussian noise. The practical challenge is calibrating epsilon: values below 1 provide strong privacy but may significantly degrade accuracy, while values above 10 provide minimal privacy benefit.

Output filtering using Microsoft Presidio or spaCy NER provides a runtime defence for generative models, scanning generated text for personal names, addresses, phone numbers, and other identifiable information before delivery. Presidio uses a mix of pattern matching, NER models, and custom recognisers supporting multiple languages and entity types.

Membership inference testing using ML Privacy Meter evaluates the model's susceptibility by training an attack model on a shadow dataset and measuring its ability to distinguish training members from non-members. If the attack achieves significantly better than random accuracy, the model is leaking membership information. A common acceptable threshold is an attack AUC-ROC below 0.55, meaning only marginally better than chance.

What controls address insecure plugins and excessive agency?

Engineering Approach

For systems where the AI model interfaces with external tools, APIs, or plugins, insufficient validation of tool usage can lead to unauthorised actions.

For systems where the AI model interfaces with external tools, APIs, or plugins, insufficient validation of tool usage can lead to unauthorised actions. The system must validate tool calls against an allowlist of permitted actions with permitted parameters, require human approval for high-impact actions, and maintain comprehensive logging of all tool invocations. AISDP Module 9 records the tool and plugin inventory and the permission model for each integration, while Module 7 captures which actions require human approval.

Excessive agency occurs when the system is granted more autonomy, permissions, or capabilities than its intended purpose requires, creating unnecessary risk surfaces. The principle of least privilege must be applied to access rights, API permissions, and action capabilities. The system's authorised scope is documented in the AISDP and enforced through technical controls, not merely policy.

Regular access reviews confirm that permissions remain proportionate to the documented purpose. Any gap between technical capabilities and the documented intended purpose represents an excessive agency risk that the AISDP must acknowledge. The system's complete permission inventory and justification for each access right belong in Module 9, while Module 1 defines the intended scope of autonomy.

Overreliance, where users treat AI outputs as authoritative without verification, is addressed primarily through human oversight measures and automation bias countermeasures. The cybersecurity dimension is the technical enforcement of human review: the system must not be configurable to operate without human oversight for high-risk decisions. The controls preventing bypass of human oversight feed into Module 9, including enforcement of mandatory review workflows that prevent operators from bulk-approving recommendations without individual assessment.

How should teams build an AI-specific threat model?

Compensating Controls

Traditional threat modelling identifies threats to software systems such as injection attacks, authentication bypass, and privilege escalation.

Traditional threat modelling identifies threats to software systems such as injection attacks, authentication bypass, and privilege escalation. AI systems face all of these plus threats exploiting learning and inference processes that traditional modelling does not address. The threat model must cover both the traditional software attack surface and the AI-specific attack surface using a combined approach.

MITRE ATLAS provides the taxonomic foundation, cataloguing adversarial techniques across reconnaissance, resource development, initial access, execution, persistence, evasion, and impact categories, each with real-world case studies and documented mitigations. The threat modelling exercise should proceed in four stages: scope the system's attack surface by identifying every point where external input enters or output affects decisions; enumerate threats at each point using the combined STRIDE plus ATLAS taxonomy; assess each threat using likelihood multiplied by impact with impact scored across health and safety, fundamental rights, operational integrity, and reputational dimensions; and define mitigations for each threat above the risk acceptance threshold.

AI-specific threats often score differently from traditional ones. A training data poisoning attack may have low likelihood because it requires training pipeline access, yet catastrophic impact because it can systematically bias every prediction. The Technical SME documents the threat model as a structured artefact using tools such as IriusRisk or OWASP Threat Dragon, version-controlled and reviewed annually at minimum.

The threat model is updated whenever the system's architecture, data sources, or deployment context change materially. It feeds directly into the cybersecurity testing programme: penetration testing and adversarial ML testing should exercise the threats identified. provides the overarching framework within which this threat model operates.

Defending against model theft and adversarial examples

Engineering Approach

Model extraction attacks involve an adversary repeatedly querying the model, collecting input-output pairs, and training a surrogate that approximates the original's functionality.

Model extraction attacks involve an adversary repeatedly querying the model, collecting input-output pairs, and training a surrogate that approximates the original's functionality. For high-risk AI systems, the consequences extend beyond intellectual property loss: the adversary obtains a model without compliance controls, monitoring, or governance. Rate limiting that caps total query volume per client over daily and weekly windows makes extraction prohibitively slow, with limits enforced per authenticated identity rather than merely per IP address.

Network segmentation restricts which systems can access the model serving endpoint. The inference service should be accessible only through the application layer, not directly from the internet. Kubernetes NetworkPolicies define pod-level access rules, and a service mesh such as Istio or Linkerd adds mutual TLS ensuring every connection is authenticated and encrypted.

Encrypted model storage protects artefacts at rest using a key management service, with key access restricted to the serving infrastructure and named administrators. Model watermarking embeds a detectable signal in the model's behaviour that survives extraction, enabling ownership verification. Backdoor-based watermarking, training the model to produce specific outputs on rare trigger inputs, is the most practical current approach.

For API-accessed systems where the model is served to external deployers, the extraction risk is higher because the deployer has legitimate query access. Contractual controls prohibiting systematic querying for extraction purposes complement the technical controls. The inference logging infrastructure should retain sufficient query detail to detect and evidence extraction attempts if a contractual dispute arises.

What procedural alternatives exist when tooling is limited?

Compensating Controls

Where full technical controls are not feasible, procedural alternatives can partially address AI-specific threats, though with documented limitations.

Where full technical controls are not feasible, procedural alternatives can partially address AI-specific threats, though with documented limitations. For training data integrity, an access policy should document who has read, write, and delete access to training data repositories, with access lists reviewed quarterly. Every modification is logged with the modifier's identity, timestamp, description, and rationale, with the log reviewed before each training run.

A data engineer reviews a random sample of 100 to 500 records before each training run, checking for anomalies, unexpected values, or suspicious patterns. This approach loses automated anomaly detection on every modification, meaning sophisticated poisoning attacks modifying a small number of records may evade manual sampling.

For output review in systems with low output volume, a human reviews every output before delivery for PII, confidential information, or inappropriate content. For higher-volume systems, a random sample is reviewed daily. Before each training run, the DPO Liaison reviews training data for PII that should have been removed. This approach loses mathematical privacy guarantees, automated PII detection at scale, and quantitative membership inference risk assessment.

For model theft prevention, the engineering team issues unique API keys per client with usage volume tracked manually. The Technical SME reviews API access logs weekly for unusual query patterns. Deployer agreements contractually prohibit systematic querying for extraction purposes. The minimum tooling required includes NGINX rate limiting, cloud encryption at rest on standard subscriptions, and Kubernetes NetworkPolicies at no additional cost. Cloud audit logging services such as CloudTrail and Azure Monitor are included in standard cloud subscriptions and should be enabled as a baseline.

AI-Specific Threats and the OWASP LLM Top 10

Written by

Why do AI systems face threats that traditional cybersecurity misses?

How does prompt injection compromise AI systems?

What risks arise from insecure output handling?

Training data poisoning and integrity controls

How should organisations defend against model denial of service?

Supply chain vulnerabilities in ML systems

Protecting against sensitive information disclosure

What controls address insecure plugins and excessive agency?

How should teams build an AI-specific threat model?

Defending against model theft and adversarial examples

What procedural alternatives exist when tooling is limited?

Frequently Asked Questions

Related Pages

In This Section

Build compliance into your pipeline