We use cookies to improve your experience and analyse site traffic.
The EU AI Act requires high-risk AI systems to implement a cybersecurity testing programme covering traditional software security and AI-specific attack vectors. This programme operates as a continuous cycle of testing, remediation, and re-testing throughout the system's lifetime.
The AISDP must document a cybersecurity testing programme proportionate to the system's risk profile, covering both traditional software security and AI-specific attack vectors.
The aisdp must document a cybersecurity testing programme proportionate to the system's risk profile, covering both traditional software security and AI-specific attack vectors. The programme operates as a continuous cycle of testing, remediation, and re-testing throughout the system's lifetime. It is not a single annual event; rather, it combines scheduled assessments with continuous automated scanning to maintain ongoing assurance.
The programme has three core components: traditional penetration testing, automated vulnerability scanning, and AI-specific adversarial testing. All three are necessary, and none alone is sufficient. Cybersecurity for High-Risk AI sets out the overall cybersecurity framework within which this testing programme sits.
Traditional penetration testing exercises the system's defences from an attacker's perspective, covering both the standard web application and infrastructure attack surface and the AI-specific attack surface identified in the threat model. Vulnerability scanning provides continuous automated coverage between penetration tests through four scanning layers: application dependencies, container images, infrastructure configurations, and operating systems. Adversarial ML testing evaluates the model's resilience to attacks that have no analogue in traditional cybersecurity, including evasion attacks, data poisoning, prompt injection, and model extraction.
Penetration testing should be conducted annually by an independent firm with expertise in both traditional application security and AI-specific threats.
Penetration testing should be conducted annually by an independent firm with expertise in both traditional application security and AI-specific threats. The scope must cover all attack surfaces: internet-facing APIs, operator interfaces, administrative endpoints, inter-service communication, and model serving infrastructure.
For AI systems, the penetration test scope extends beyond standard targets. It should include model API endpoints, testing for model extraction through repeated querying and information leakage through output analysis. Data pipeline endpoints require testing for data injection or poisoning through input manipulation. The human oversight interface requires testing for privilege escalation, session hijacking, or interface manipulation that could cause operators to approve harmful outputs.
AI-specific penetration testing requires specialist expertise that most traditional firms lack. Firms with documented AI security capabilities (examples in the current market include Trail of Bits, NCC Group, and Bishop Fox, though the landscape is evolving) should be engaged by the AI Governance Lead for the AI-specific components. The engagement brief should reference the threat model and the OWASP Top 10 for LLM Applications, specifying which threats the test should exercise. Testing frequency should be annual at minimum and after every substantial modification to the system.
The penetration testing firm provides a structured report mapping each finding to a severity rating (CVSS score), the affected AISDP module, and a recommended remediation. Critical and high-severity findings should have documented remediation timelines: typically 30 days for critical findings and 90 days for high-severity findings. The Technical SME verifies remediation through re-testing.
Beyond the standard web application and infrastructure targets, penetration testing should explicitly attempt four AI-specific attack categories. Input injection through the inference API uses crafted inputs designed to trigger unexpected behaviour. Human oversight bypass tests whether any path exists from inference to consequential action without human review. Data pipeline compromise tests whether an attacker with limited access can modify training data or pipeline configuration. Model serving manipulation tests whether an attacker can force the serving infrastructure to load an unauthorised model version.
Automated vulnerability scanning should run continuously across all system components, providing coverage between penetration tests.
Automated vulnerability scanning should run continuously across all system components, providing coverage between penetration tests. The scanning programme has four layers that together cover the full technology stack.
Application dependency scanning, using tools such as Snyk, Dependabot, or OWASP Dependency-Check, runs on every code commit via the CI pipeline and alerts on newly disclosed vulnerabilities in the project's dependency tree. Container image scanning, using Trivy, Grype, or Snyk Container, runs on every container build and periodically on deployed images, catching vulnerabilities in base images and OS packages disclosed after the image was built. CI/CD Pipeline Security covers the integration of these scans into the deployment pipeline.
Infrastructure-as-code scanning, using Checkov, tfsec, or KICS, runs on every IaC change, catching security misconfigurations such as open security groups, unencrypted storage, and overly permissive IAM policies before deployment. Operating system scanning, using Qualys, Nessus, or OpenVAS, runs periodically on deployed infrastructure to catch OS-level vulnerabilities.
The engineering team integrates scans into the CI pipeline so that newly introduced vulnerabilities are caught before deployment. Production environments should be scanned on a scheduled basis, daily or weekly, to catch vulnerabilities in existing deployments that may have been disclosed after the original build.
Scan results feed into a vulnerability management register that tracks each vulnerability's severity, affected component, remediation status, and remediation deadline. Each scanning layer should have a defined remediation SLA: the engineering team patches critical vulnerabilities within 24 to 72 hours, high-severity vulnerabilities within one to two weeks, and medium-severity vulnerabilities within the next planned maintenance window. The SLA, the current vulnerability count by severity, and the remediation status are reported by the Technical SME to the governance team as a Module 6 compliance metric. The register is also a Module 9 evidence artefact.
AI-specific security testing requires specialised tools and methodologies with no analogue in traditional cybersecurity.
AI-specific security testing requires specialised tools and methodologies with no analogue in traditional cybersecurity. The testing programme evaluates the model's resilience to adversarial inputs, data poisoning, prompt injection, and model extraction.
For evasion attacks using adversarial examples, IBM's Adversarial Robustness Toolbox (ART) provides the most comprehensive library. ART implements white-box attacks (FGSM, PGD, C&W, DeepFool) that use knowledge of the model's gradients to craft minimal perturbations changing the model's prediction, and black-box attacks that work without gradient access. For tabular models, the testing protocol perturbs input features at realistic noise levels and records the prediction change rate. For image models, the protocol generates adversarial images at varying perturbation magnitudes and records the success rate. For text models, TextAttack provides character-level, word-level, and sentence-level perturbation attacks. Test results should report the attack success rate at each perturbation magnitude and compare against the robustness thresholds declared in the AISDP.
Data poisoning simulation tests the model's resilience to corrupted training data. ART's poisoning modules provide the simulation capabilities. The test inserts known poisoned records into a copy of the training dataset, retrains the model, and evaluates whether the poisoned model's behaviour deviates from the clean model on both the poisoned trigger inputs and legitimate inputs. This simulation quantifies the system's vulnerability to poisoning and validates the effectiveness of data integrity controls. The test should determine the minimum poisoning rate that produces a detectable effect, which informs data integrity monitoring thresholds.
Annual red team exercises simulate realistic threat scenarios that combine technical attacks with social engineering.
Annual red team exercises simulate realistic threat scenarios that combine technical attacks with social engineering. For an AI system, a red team scenario might involve attempting to manipulate outputs by corrupting a data source the system depends upon, or attempting to cause the human oversight layer to approve harmful outputs by exploiting automation bias.
Other scenarios include attempting to extract sensitive information from the model through carefully crafted queries, or attempting to trigger a denial of service by submitting adversarial inputs designed to consume excessive computational resources.
The Technical SME conducts red team exercises with personnel who were not involved in the system's development and who have realistic threat actor capabilities. Independence is essential: the exercise should test defences as an external attacker would encounter them, without insider knowledge of implementation shortcuts or configuration details. The exercise produces a detailed report with findings, exploited vulnerabilities, and recommended mitigations. Red team reports feed into the threat model and inform subsequent penetration testing scope.
Manual security code review complements automated SAST/DAST scanning integrated into the CI pipeline.
Manual security code review complements automated SAST/DAST scanning integrated into the CI pipeline. The Technical SME conducts manual review for security-critical components: the authentication and authorisation logic, the model serving and API gateway code, the data validation and sanitisation logic, the logging and audit trail implementation, and any custom cryptographic implementations.
Manual review catches logic flaws and design-level vulnerabilities that automated tools miss. This is particularly important for AI systems where the interaction between model serving code, input validation, and output filtering may introduce vulnerabilities that span multiple components and cannot be detected by single-file static analysis. Comprehensive logging should capture every tool invocation with its parameters and outcome to support post-incident analysis.
Four additional threat categories beyond model-level attacks require dedicated test coverage within the testing programme.
Four additional threat categories beyond model-level attacks require dedicated test coverage within the testing programme.
Output validation testing addresses systems where model outputs are consumed by downstream components including web interfaces, databases, APIs, and workflow engines. The testing programme verifies that no model output can trigger a secondary vulnerability. Test cases should include generating outputs containing SQL injection payloads, cross-site scripting vectors, command injection strings, and malformed data structures, then verifying that the output validation layer neutralises each payload before it reaches the downstream component. The Technical SME updates the test suite whenever a new downstream integration is added.
Denial of service testing, addressing load-based and adversarial resource exhaustion, verifies the system's resilience to resource exhaustion attacks. Test coverage should include sustained high-volume request testing at rates exceeding expected peak load by at least a factor of three, adversarial input testing with inputs designed to maximise inference time (for neural networks, this may include inputs with unusual dimensions, extreme values, or pathological structures), and combined testing where high-volume legitimate requests coincide with adversarial inputs. The pass criteria are that the system maintains documented latency and throughput targets under load, that rate limiting activates correctly, that timeouts terminate long-running inferences, and that the system recovers automatically after the attack ceases.
Plugin and tool security testing applies to systems where the AI component invokes external tools or plugins. Testing verifies that the tool allowlist is enforced and the system cannot invoke unlisted tools. Parameter validation must prevent the system from passing malicious or out-of-scope parameters to authorised tools. Human approval gates must function correctly for high-impact actions. Comprehensive logging must capture every tool invocation with its parameters and outcome to support post-incident analysis. Test cases include attempting to invoke disallowed tools, passing boundary and malformed parameters to allowed tools, and verifying that the human approval workflow cannot be bypassed through rapid sequential requests.
All cybersecurity testing produces evidence for AISDP Module 9 through a structured testing summary table.
All cybersecurity testing produces evidence for AISDP Module 9 through a structured testing summary table. The table maps each test type to its most recent execution date, the scope covered, the number of findings by severity, the remediation status, and the next scheduled execution date. This summary provides governance teams with a single view of the organisation's cybersecurity testing posture across all test categories.
Detailed test reports and remediation records are maintained by the Technical SME in the evidence pack with immutable timestamps. For penetration testing, the report should map each finding to a CVSS severity rating, the affected AISDP module, and the recommended remediation along with its timeline. For adversarial ML testing, the report should document the attack methodologies used, success rates at each perturbation magnitude, and the compensating controls in place. For vulnerability scanning, the register tracks each vulnerability's severity, affected component, and remediation status against the defined SLA.
The security team tracks findings above the risk acceptance threshold to remediation and reports the current vulnerability count, remediation status, and SLA compliance to the governance team as compliance metrics.
Penetration testing can be conducted by an external firm using their own tooling; the organisation does not need to licence specialist tools such as Burp Suite or Metasploit itself.
Penetration testing can be conducted by an external firm using their own tooling; the organisation does not need to licence specialist tools such as Burp Suite or Metasploit itself. If the external firm is competent, nothing is lost in capability. An annual engagement of a penetration testing firm with documented AI security expertise is actually the recommended approach for most organisations. The engagement brief should reference the threat model and the OWASP Top 10 for LLM Applications. A structured report with findings, severity ratings, and remediation recommendations feeds into remediation tracking in the non-conformity register. OWASP ZAP is open-source for organisations wanting to supplement external testing with internal capability.
For vulnerability management, manual checking against public databases is possible though significantly slower and less comprehensive than automated scanning. This requires monthly manual review of all dependencies against the NVD, quarterly manual review of container base image CVEs, and infrastructure configuration review against CIS benchmarks using a manual checklist. The critical weakness of the manual approach is that it discovers vulnerabilities only at the next scheduled review rather than at disclosure, losing real-time alerting on newly disclosed vulnerabilities. Given that Trivy, Grype, pip-audit, and Checkov are all open-source and free, automated scanning is strongly recommended even for resource-constrained organisations.
Adversarial attack generation is a computational process: adversarial examples cannot be manually crafted at the scale and sophistication needed for meaningful evaluation. The minimum tooling comprises IBM Adversarial Robustness Toolbox, TextAttack, and Garak, all of which are open-source and free. There is no viable manual alternative for this component of the testing programme.
Yes, this is actually the recommended approach for most organisations. The external firm uses their own tooling, and the organisation does not need to licence specialist tools. The engagement brief should reference the threat model and OWASP Top 10 for LLM Applications.
Manual checking against public databases is possible but significantly slower. It requires monthly dependency review against the NVD, quarterly container image CVE review, and infrastructure review against CIS benchmarks. However, open-source tools like Trivy, Grype, and Checkov are free and strongly recommended.
At least biannually, and additionally after any significant model change. The testing evaluates resilience to adversarial inputs, data poisoning, prompt injection, and model extraction using specialised frameworks.
Permission boundary testing, privilege escalation testing, and scope creep testing. This verifies the system cannot access unauthorised resources, cannot increase its own permissions, and declines tasks outside its documented purpose. It is particularly important for agentic systems.
IBM's Adversarial Robustness Toolbox for evasion attacks, TextAttack for text model perturbations, and Garak for prompt injection testing, all open-source.
All testing produces a summary table mapping each test type to execution date, scope, findings by severity, remediation status, and next scheduled date, stored as Module 9 evidence.
Four layers: application dependency scanning, container image scanning, infrastructure-as-code scanning, and operating system scanning, each with defined remediation SLAs.
External penetration testing firms, open-source scanning tools, and manual vulnerability checking against public databases, though adversarial ML testing requires computational tools with no manual alternative.
For systems incorporating LLMs, prompt injection testing should use both known injection patterns and novel adversarial prompts. Garak, an open-source tool from NVIDIA, provides automated scanning by sending a battery of prompt injection payloads: direct injection, indirect injection via document content, jailbreak prompts, and system prompt extraction attempts. The test should also include custom injection payloads derived from the system's specific context. If the LLM processes user-uploaded documents, the test should embed injection prompts within documents and verify that the system's guardrails detect and reject them. Multi-turn injection attacks should also be tested.
Model extraction testing evaluates whether an attacker can reconstruct the model's decision boundaries through systematic querying. The test protocol allocates a query budget (for example, 10,000 queries), submits systematic inputs through automated query generation against the inference API, collects the model's outputs, trains a surrogate model on the collected data, and evaluates the surrogate's fidelity to the original. The test also assesses the effectiveness of rate limiting and anomaly detection controls. The test reports the fidelity achieved at the allocated query budget, quantifying the extraction risk and informing rate limiting configuration. AI Risk Assessment Methodology provides the broader risk assessment context for interpreting these results.
The Technical SME conducts adversarial ML testing at least biannually, and additionally after any significant model change. Test results, including the attack methodologies used, the success rates, and the compensating controls in place, are documented in a structured report stored as Module 6 evidence. Results are fed back into the threat model and risk register. The security team tracks findings above the risk acceptance threshold to remediation.
Excessive agency testing verifies that the system's actual capabilities do not exceed its documented intended scope. The programme includes permission boundary testing, attempting to access resources, APIs, or data stores that the system should not be able to reach. Privilege escalation testing attempts to increase the system's permissions through its own actions. Scope creep testing presents the system with tasks that fall outside its documented intended purpose and verifies that it declines or escalates rather than attempting to fulfil them. For agentic systems, this testing is particularly important and the Technical SME conducts it after every change to the system's tool integrations or permission model.