What does a cybersecurity testing programme for AI systems include?

The programme covers penetration testing, automated vulnerability scanning, adversarial ML testing, red team exercises, security code review, and dedicated testing for output validation, denial of service, plugin security, and excessive agency.

How should penetration testing be adapted for AI systems?

The scope extends beyond standard targets to include model API endpoints, data pipeline endpoints, and human oversight interfaces, with specialist firms engaged for AI-specific components.

What tools are used for adversarial machine learning testing?

IBM's Adversarial Robustness Toolbox for evasion attacks, TextAttack for text model perturbations, and Garak for prompt injection testing, all open-source.

How are cybersecurity test results documented?

All testing produces a summary table mapping each test type to execution date, scope, findings by severity, remediation status, and next scheduled date, stored as Module 9 evidence.

What vulnerability scanning layers are required?

Four layers: application dependency scanning, container image scanning, infrastructure-as-code scanning, and operating system scanning, each with defined remediation SLAs.

What compensating controls exist for testing without specialist tooling?

External penetration testing firms, open-source scanning tools, and manual vulnerability checking against public databases, though adversarial ML testing requires computational tools with no manual alternative.

Can we use an external firm for penetration testing instead of building internal capability?

Yes, this is actually the recommended approach for most organisations. The external firm uses their own tooling, and the organisation does not need to licence specialist tools. The engagement brief should reference the threat model and OWASP Top 10 for LLM Applications.

Is manual vulnerability management acceptable as a compensating control?

Manual checking against public databases is possible but significantly slower. It requires monthly dependency review against the NVD, quarterly container image CVE review, and infrastructure review against CIS benchmarks. However, open-source tools like Trivy, Grype, and Checkov are free and strongly recommended.

How often should adversarial ML testing be conducted?

At least biannually, and additionally after any significant model change. The testing evaluates resilience to adversarial inputs, data poisoning, prompt injection, and model extraction using specialised frameworks.

What should excessive agency testing cover for AI systems?

Permission boundary testing, privilege escalation testing, and scope creep testing. This verifies the system cannot access unauthorised resources, cannot increase its own permissions, and declines tasks outside its documented purpose. It is particularly important for agentic systems.

Can we use an external firm for penetration testing instead of building internal capability?

Yes, this is actually the recommended approach for most organisations. The external firm uses their own tooling, and the organisation does not need to licence specialist tools. The engagement brief should reference the threat model and OWASP Top 10 for LLM Applications.

Is manual vulnerability management acceptable as a compensating control?

Manual checking against public databases is possible but significantly slower. It requires monthly dependency review against the NVD, quarterly container image CVE review, and infrastructure review against CIS benchmarks. However, open-source tools like Trivy, Grype, and Checkov are free and strongly recommended.

How often should adversarial ML testing be conducted?

At least biannually, and additionally after any significant model change. The testing evaluates resilience to adversarial inputs, data poisoning, prompt injection, and model extraction using specialised frameworks.

What should excessive agency testing cover for AI systems?

Permission boundary testing, privilege escalation testing, and scope creep testing. This verifies the system cannot access unauthorised resources, cannot increase its own permissions, and declines tasks outside its documented purpose. It is particularly important for agentic systems.

Cybersecurity Testing Programme for AI Systems

Written by

Michael Clark

Chief Executive Officer, Standard Intelligence

Founder and CEO of Standard Intelligence. Author of the Practitioners Implementation Guide series for EU AI Act compliance.

Martin Dean

Chief Technology Officer, Standard Intelligence

CTO of Standard Intelligence. Leads platform engineering and contributes to the PIG series technical content.

The EU AI Act requires high-risk AI systems to implement a cybersecurity testing programme covering traditional software security and AI-specific attack vectors. This programme operates as a continuous cycle of testing, remediation, and re-testing throughout the system's lifetime.

Abstract

Read abstract

High-risk AI systems require a structured cybersecurity testing programme that goes beyond traditional software security to address AI-specific attack vectors including adversarial examples, data poisoning, prompt injection, and model extraction. The programme comprises three core components: penetration testing conducted annually by independent firms with AI security expertise, continuous automated vulnerability scanning across application dependencies, container images, infrastructure configurations, and operating systems, and specialised adversarial machine learning testing using frameworks such as IBM's Adversarial Robustness Toolbox, TextAttack, and Garak. Red team exercises simulate realistic threat scenarios combining technical attacks with social engineering. Additional test coverage addresses output validation to prevent secondary vulnerabilities in downstream components, denial of service resilience, plugin and tool security for AI systems that invoke external tools, and excessive agency testing to verify capabilities do not exceed documented scope. All testing produces evidence for AISDP Module 9 through structured reports with findings mapped to severity ratings, remediation timelines, and next scheduled execution dates. Compensating controls allow organisations to engage external penetration testing firms and use open-source scanning tools where commercial tooling is not available.

What does a cybersecurity testing programme cover?

Regulatory Requirement

The AISDP must document a cybersecurity testing programme proportionate to the system's risk profile, covering both traditional software security and AI-specific attack vectors.

The aisdp must document a cybersecurity testing programme proportionate to the system's risk profile, covering both traditional software security and AI-specific attack vectors. The programme operates as a continuous cycle of testing, remediation, and re-testing throughout the system's lifetime. It is not a single annual event; rather, it combines scheduled assessments with continuous automated scanning to maintain ongoing assurance.

The programme has three core components: traditional penetration testing, automated vulnerability scanning, and AI-specific adversarial testing. All three are necessary, and none alone is sufficient. Cybersecurity for High-Risk AI sets out the overall cybersecurity framework within which this testing programme sits.

Traditional penetration testing exercises the system's defences from an attacker's perspective, covering both the standard web application and infrastructure attack surface and the AI-specific attack surface identified in the threat model. Vulnerability scanning provides continuous automated coverage between penetration tests through four scanning layers: application dependencies, container images, infrastructure configurations, and operating systems. Adversarial ML testing evaluates the model's resilience to attacks that have no analogue in traditional cybersecurity, including evasion attacks, data poisoning, prompt injection, and model extraction.

How should penetration testing address AI-specific threats?

Engineering Approach

Penetration testing should be conducted annually by an independent firm with expertise in both traditional application security and AI-specific threats.

Penetration testing should be conducted annually by an independent firm with expertise in both traditional application security and AI-specific threats. The scope must cover all attack surfaces: internet-facing APIs, operator interfaces, administrative endpoints, inter-service communication, and model serving infrastructure.

For AI systems, the penetration test scope extends beyond standard targets. It should include model API endpoints, testing for model extraction through repeated querying and information leakage through output analysis. Data pipeline endpoints require testing for data injection or poisoning through input manipulation. The human oversight interface requires testing for privilege escalation, session hijacking, or interface manipulation that could cause operators to approve harmful outputs.

AI-specific penetration testing requires specialist expertise that most traditional firms lack. Firms with documented AI security capabilities (examples in the current market include Trail of Bits, NCC Group, and Bishop Fox, though the landscape is evolving) should be engaged by the AI Governance Lead for the AI-specific components. The engagement brief should reference the threat model and the OWASP Top 10 for LLM Applications, specifying which threats the test should exercise. Testing frequency should be annual at minimum and after every substantial modification to the system.

The penetration testing firm provides a structured report mapping each finding to a severity rating (CVSS score), the affected AISDP module, and a recommended remediation. Critical and high-severity findings should have documented remediation timelines: typically 30 days for critical findings and 90 days for high-severity findings. The Technical SME verifies remediation through re-testing.

Beyond the standard web application and infrastructure targets, penetration testing should explicitly attempt four AI-specific attack categories. Input injection through the inference API uses crafted inputs designed to trigger unexpected behaviour. Human oversight bypass tests whether any path exists from inference to consequential action without human review. Data pipeline compromise tests whether an attacker with limited access can modify training data or pipeline configuration. Model serving manipulation tests whether an attacker can force the serving infrastructure to load an unauthorised model version.

What does continuous vulnerability scanning involve?

Engineering Approach

Automated vulnerability scanning should run continuously across all system components, providing coverage between penetration tests.

Automated vulnerability scanning should run continuously across all system components, providing coverage between penetration tests. The scanning programme has four layers that together cover the full technology stack.

Application dependency scanning, using tools such as Snyk, Dependabot, or OWASP Dependency-Check, runs on every code commit via the CI pipeline and alerts on newly disclosed vulnerabilities in the project's dependency tree. Container image scanning, using Trivy, Grype, or Snyk Container, runs on every container build and periodically on deployed images, catching vulnerabilities in base images and OS packages disclosed after the image was built. CI/CD Pipeline Security covers the integration of these scans into the deployment pipeline.

Infrastructure-as-code scanning, using Checkov, tfsec, or KICS, runs on every IaC change, catching security misconfigurations such as open security groups, unencrypted storage, and overly permissive IAM policies before deployment. Operating system scanning, using Qualys, Nessus, or OpenVAS, runs periodically on deployed infrastructure to catch OS-level vulnerabilities.

The engineering team integrates scans into the CI pipeline so that newly introduced vulnerabilities are caught before deployment. Production environments should be scanned on a scheduled basis, daily or weekly, to catch vulnerabilities in existing deployments that may have been disclosed after the original build.

Scan results feed into a vulnerability management register that tracks each vulnerability's severity, affected component, remediation status, and remediation deadline. Each scanning layer should have a defined remediation SLA: the engineering team patches critical vulnerabilities within 24 to 72 hours, high-severity vulnerabilities within one to two weeks, and medium-severity vulnerabilities within the next planned maintenance window. The SLA, the current vulnerability count by severity, and the remediation status are reported by the Technical SME to the governance team as a Module 6 compliance metric. The register is also a Module 9 evidence artefact.

How is adversarial machine learning testing conducted?

Engineering Approach

AI-specific security testing requires specialised tools and methodologies with no analogue in traditional cybersecurity.

AI-specific security testing requires specialised tools and methodologies with no analogue in traditional cybersecurity. The testing programme evaluates the model's resilience to adversarial inputs, data poisoning, prompt injection, and model extraction.

For evasion attacks using adversarial examples, IBM's Adversarial Robustness Toolbox (ART) provides the most comprehensive library. ART implements white-box attacks (FGSM, PGD, C&W, DeepFool) that use knowledge of the model's gradients to craft minimal perturbations changing the model's prediction, and black-box attacks that work without gradient access. For tabular models, the testing protocol perturbs input features at realistic noise levels and records the prediction change rate. For image models, the protocol generates adversarial images at varying perturbation magnitudes and records the success rate. For text models, TextAttack provides character-level, word-level, and sentence-level perturbation attacks. Test results should report the attack success rate at each perturbation magnitude and compare against the robustness thresholds declared in the AISDP.

Data poisoning simulation tests the model's resilience to corrupted training data. ART's poisoning modules provide the simulation capabilities. The test inserts known poisoned records into a copy of the training dataset, retrains the model, and evaluates whether the poisoned model's behaviour deviates from the clean model on both the poisoned trigger inputs and legitimate inputs. This simulation quantifies the system's vulnerability to poisoning and validates the effectiveness of data integrity controls. The test should determine the minimum poisoning rate that produces a detectable effect, which informs data integrity monitoring thresholds.

What do red team exercises simulate?

Engineering Approach

Annual red team exercises simulate realistic threat scenarios that combine technical attacks with social engineering.

Annual red team exercises simulate realistic threat scenarios that combine technical attacks with social engineering. For an AI system, a red team scenario might involve attempting to manipulate outputs by corrupting a data source the system depends upon, or attempting to cause the human oversight layer to approve harmful outputs by exploiting automation bias.

Other scenarios include attempting to extract sensitive information from the model through carefully crafted queries, or attempting to trigger a denial of service by submitting adversarial inputs designed to consume excessive computational resources.

The Technical SME conducts red team exercises with personnel who were not involved in the system's development and who have realistic threat actor capabilities. Independence is essential: the exercise should test defences as an external attacker would encounter them, without insider knowledge of implementation shortcuts or configuration details. The exercise produces a detailed report with findings, exploited vulnerabilities, and recommended mitigations. Red team reports feed into the threat model and inform subsequent penetration testing scope.

Security code review for AI components

Engineering Approach

Manual security code review complements automated SAST/DAST scanning integrated into the CI pipeline.

Manual security code review complements automated SAST/DAST scanning integrated into the CI pipeline. The Technical SME conducts manual review for security-critical components: the authentication and authorisation logic, the model serving and API gateway code, the data validation and sanitisation logic, the logging and audit trail implementation, and any custom cryptographic implementations.

Manual review catches logic flaws and design-level vulnerabilities that automated tools miss. This is particularly important for AI systems where the interaction between model serving code, input validation, and output filtering may introduce vulnerabilities that span multiple components and cannot be detected by single-file static analysis. Comprehensive logging should capture every tool invocation with its parameters and outcome to support post-incident analysis.

Testing output validation, denial of service, and agent capabilities

Engineering Approach

Four additional threat categories beyond model-level attacks require dedicated test coverage within the testing programme.

Output validation testing addresses systems where model outputs are consumed by downstream components including web interfaces, databases, APIs, and workflow engines. The testing programme verifies that no model output can trigger a secondary vulnerability. Test cases should include generating outputs containing SQL injection payloads, cross-site scripting vectors, command injection strings, and malformed data structures, then verifying that the output validation layer neutralises each payload before it reaches the downstream component. The Technical SME updates the test suite whenever a new downstream integration is added.

Denial of service testing, addressing load-based and adversarial resource exhaustion, verifies the system's resilience to resource exhaustion attacks. Test coverage should include sustained high-volume request testing at rates exceeding expected peak load by at least a factor of three, adversarial input testing with inputs designed to maximise inference time (for neural networks, this may include inputs with unusual dimensions, extreme values, or pathological structures), and combined testing where high-volume legitimate requests coincide with adversarial inputs. The pass criteria are that the system maintains documented latency and throughput targets under load, that rate limiting activates correctly, that timeouts terminate long-running inferences, and that the system recovers automatically after the attack ceases.

Plugin and tool security testing applies to systems where the AI component invokes external tools or plugins. Testing verifies that the tool allowlist is enforced and the system cannot invoke unlisted tools. Parameter validation must prevent the system from passing malicious or out-of-scope parameters to authorised tools. Human approval gates must function correctly for high-impact actions. Comprehensive logging must capture every tool invocation with its parameters and outcome to support post-incident analysis. Test cases include attempting to invoke disallowed tools, passing boundary and malformed parameters to allowed tools, and verifying that the human approval workflow cannot be bypassed through rapid sequential requests.

How are test results mapped to AISDP Module 9?

Regulatory Requirement

All cybersecurity testing produces evidence for AISDP Module 9 through a structured testing summary table.

All cybersecurity testing produces evidence for AISDP Module 9 through a structured testing summary table. The table maps each test type to its most recent execution date, the scope covered, the number of findings by severity, the remediation status, and the next scheduled execution date. This summary provides governance teams with a single view of the organisation's cybersecurity testing posture across all test categories.

Detailed test reports and remediation records are maintained by the Technical SME in the evidence pack with immutable timestamps. For penetration testing, the report should map each finding to a CVSS severity rating, the affected AISDP module, and the recommended remediation along with its timeline. For adversarial ML testing, the report should document the attack methodologies used, success rates at each perturbation magnitude, and the compensating controls in place. For vulnerability scanning, the register tracks each vulnerability's severity, affected component, and remediation status against the defined SLA.

The security team tracks findings above the risk acceptance threshold to remediation and reports the current vulnerability count, remediation status, and SLA compliance to the governance team as compliance metrics.

What are the compensating controls for organisations without specialist tooling?

Compensating Controls

Penetration testing can be conducted by an external firm using their own tooling; the organisation does not need to licence specialist tools such as Burp Suite or Metasploit itself.

Penetration testing can be conducted by an external firm using their own tooling; the organisation does not need to licence specialist tools such as Burp Suite or Metasploit itself. If the external firm is competent, nothing is lost in capability. An annual engagement of a penetration testing firm with documented AI security expertise is actually the recommended approach for most organisations. The engagement brief should reference the threat model and the OWASP Top 10 for LLM Applications. A structured report with findings, severity ratings, and remediation recommendations feeds into remediation tracking in the non-conformity register. OWASP ZAP is open-source for organisations wanting to supplement external testing with internal capability.

For vulnerability management, manual checking against public databases is possible though significantly slower and less comprehensive than automated scanning. This requires monthly manual review of all dependencies against the NVD, quarterly manual review of container base image CVEs, and infrastructure configuration review against CIS benchmarks using a manual checklist. The critical weakness of the manual approach is that it discovers vulnerabilities only at the next scheduled review rather than at disclosure, losing real-time alerting on newly disclosed vulnerabilities. Given that Trivy, Grype, pip-audit, and Checkov are all open-source and free, automated scanning is strongly recommended even for resource-constrained organisations.

Adversarial attack generation is a computational process: adversarial examples cannot be manually crafted at the scale and sophistication needed for meaningful evaluation. The minimum tooling comprises IBM Adversarial Robustness Toolbox, TextAttack, and Garak, all of which are open-source and free. There is no viable manual alternative for this component of the testing programme.

Cybersecurity Testing Programme for AI Systems

Written by

What does a cybersecurity testing programme cover?

How should penetration testing address AI-specific threats?

What does continuous vulnerability scanning involve?

How is adversarial machine learning testing conducted?

What do red team exercises simulate?

Security code review for AI components

Testing output validation, denial of service, and agent capabilities

How are test results mapped to AISDP Module 9?

What are the compensating controls for organisations without specialist tooling?

Frequently Asked Questions

Related Pages

In This Section

Build compliance into your pipeline