Why are AI-specific static analysis rules needed?

They catch coding patterns permissible in general software but problematic in high-risk AI: demographic feature use without justification, hardcoded thresholds, missing logging, and model registry bypasses.

How does dependency scanning protect AI supply chains?

It scans the deep ML framework dependency tree against known vulnerability databases at every build, blocking merges when critical vulnerabilities are found.

What licence compliance risks arise in AI systems?

AGPL-licensed libraries may force open-sourcing, and non-commercial licences may prohibit commercial deployment. These conflicts can emerge from transitive dependencies.

How should secret detection be implemented?

Run scanners as pre-commit hooks and CI pipeline steps. Treat any committed secret as compromised and rotate immediately, since Git history retains secrets even after deletion.

Which tools implement AI-specific static analysis rules?

Semgrep is the primary tool for custom AI compliance rules. It supports pattern matching for demographic feature use, hardcoded thresholds, and model registry bypass in Python code. Rules are defined in YAML and integrated into pre-commit hooks and CI pipelines.

Are open-source tools sufficient for compliance-grade static analysis?

Yes. Ruff and mypy handle code quality, Semgrep handles AI-specific rules, pip-audit covers vulnerability scanning, pip-licenses checks licence compliance, and detect-secrets catches credentials. All are free and open-source.

Can manual code review replace automated static analysis?

Partially. A structured review checklist covers AI-specific concerns in changed code, but manual review does not re-scan existing code and makes dependency vulnerability detection reactive rather than continuous.

Which tools implement AI-specific static analysis rules?

Semgrep is the primary tool for custom AI compliance rules. It supports pattern matching for demographic feature use, hardcoded thresholds, and model registry bypass in Python code. Rules are defined in YAML and integrated into pre-commit hooks and CI pipelines.

Are open-source tools sufficient for compliance-grade static analysis?

Yes. Ruff and mypy handle code quality, Semgrep handles AI-specific rules, pip-audit covers vulnerability scanning, pip-licenses checks licence compliance, and detect-secrets catches credentials. All are free and open-source.

Can manual code review replace automated static analysis?

Partially. A structured review checklist covers AI-specific concerns in changed code, but manual review does not re-scan existing code and makes dependency vulnerability detection reactive rather than continuous.

Static Analysis and Pre-Commit Hooks for AI Compliance

Written by

Michael Clark

Chief Executive Officer, Standard Intelligence

Founder and CEO of Standard Intelligence. Author of the Practitioners Implementation Guide series for EU AI Act compliance.

Martin Dean

Chief Technology Officer, Standard Intelligence

CTO of Standard Intelligence. Leads platform engineering and contributes to the PIG series technical content.

Static analysis for AI systems extends well beyond conventional linting. Five categories form a comprehensive toolchain: code quality analysis, AI-specific custom rules, dependency vulnerability scanning, licence compliance scanning, and secret detection. These catch systematic coding practices that could create compliance risk before those practices enter the codebase.

Abstract

Read abstract

Static analysis for high-risk AI systems must address coding patterns that are permissible in general software but problematic under the EU AI Act. Four AI-specific rule categories are particularly valuable: demographic feature use rules that flag unexamined references to protected characteristics, hardcoded threshold rules that enforce externalised configuration, missing logging rules that catch Article 12 gaps, and model registry bypass rules that prevent traceability chain breaks. Dependency scanning protects against supply chain vulnerabilities in the deep ML framework dependency tree. Licence compliance scanning prevents conflicts from AGPL or non-commercial licences in transitive dependencies. Secret detection through pre-commit hooks and CI gates prevents credential exposure. Tools such as Semgrep enable AI-specific rules as machine-enforceable policy, while pip-audit, Snyk, and detect-secrets cover vulnerability and secret scanning. All recommended tools are free and open-source. Manual code review using a structured checklist is a partial alternative, though it cannot match the coverage and consistency of automated scanning.

What categories of static analysis must the pipeline enforce?

Engineering Approach

Static analysis for AI systems must go beyond conventional linting to address five categories, each targeting a distinct compliance risk.

Static analysis for AI systems must go beyond conventional linting to address five categories, each targeting a distinct compliance risk. Code quality analysis uses standard linting tools such as Ruff, pylint, or ESLint alongside type checking with mypy or pyright and complexity analysis covering cyclomatic and cognitive complexity. Code exceeding defined complexity thresholds is flagged for refactoring before it introduces maintenance and auditability risks that could weaken the aisdp evidence.

AI-specific static analysis rules address coding patterns that are permissible in general software but problematic in high-risk AI systems. Custom rules should flag the use of demographic features such as gender, age, ethnicity, and disability status without explicit justification recorded in the feature registry. They should flag hardcoded threshold values that should be externally configurable, missing logging instrumentation in inference paths, direct file system access to model artefacts that bypasses the model registry's version control, raw data access that bypasses the data governance layer, and imports of deprecated or known-vulnerable ML framework versions.

Dependency scanning examines every third-party dependency, including Python packages, npm modules, and system libraries, against known vulnerability databases such as CVE and OSV at every build. The scan should fail the pipeline if any dependency has a known critical or high-severity vulnerability without an approved exception documented by the security team.

Licence compliance scanning automates the verification of dependency licence compatibility with the system's distribution model and the organisation's intellectual property policy. This is particularly relevant for AI systems incorporating open-source model components, where licence terms such as AGPL may impose obligations on downstream use that conflict with commercial deployment.

Secret detection scans code repositories for accidentally committed credentials, API keys, database connection strings, or personal data. Tools such as git-secrets, TruffleHog, or detect-secrets should run as pre-commit hooks catching secrets before they enter the repository, and as pipeline gates catching any that bypass the hook.

What AI-specific custom rules should be implemented?

Engineering Approach

Four categories of custom static analysis rules are particularly valuable for compliance enforcement.

Four categories of custom static analysis rules are particularly valuable for compliance enforcement. The demographic feature use rule flags any direct reference to protected characteristic columns in feature engineering or model training code. The flag does not mean the code is wrong; it means the Technical SME must justify and document the use. The justification review is triggered automatically through the CODEOWNERS mechanism described in version control governance.

The hardcoded thresholds rule flags magic numbers in decision logic. The Technical SME must define thresholds in version-controlled configuration files rather than embedded in code, so that threshold changes are tracked and reviewed through the standard change management process. This prevents invisible changes to the system's decision boundaries.

The missing logging rule flags inference code paths that do not emit a log event. Since Article 12 requires automatic recording of events, any inference path that can execute without logging is a compliance gap. The rule ensures that no prediction can be produced without a corresponding audit trail entry.

The model registry bypass rule flags direct model file loading that circumvents the model registry. Direct loading breaks the traceability chain because the loaded model version is not recorded, the model's provenance cannot be verified, and the version control discipline is undermined. Models must be loaded through the registry so that every inference can be traced back to the specific model version, its training data, and its evaluation results.

Semgrep provides the implementation framework for these rules, with custom YAML rule definitions that match code patterns and produce compliance-relevant warning or error messages. Each rule specifies the pattern to match, the severity level (WARNING for rules requiring justification, ERROR for rules requiring code change), the affected programming languages, and a message linking the finding to the relevant AISDP section. The rules are stored in the repository alongside the application code and version-controlled with the same discipline, ensuring that compliance checks evolve alongside the codebase.

How do dependency scanning and licence compliance work?

Engineering Approach

Dependency scanning is essential for supply chain security.

Dependency scanning is essential for supply chain security. Snyk, Dependabot, and pip-audit scan the project's dependency tree against known vulnerability databases and alert on vulnerable versions. The scanner runs on every commit and blocks merges if critical vulnerabilities are found. OWASP Dependency-Check provides an open-source alternative with NIST NVD integration. The dependency scan results are retained as cybersecurity evidence for AISDP Module 9.

Licence compliance scanning prevents the organisation from inadvertently using libraries with licence terms that conflict with the system's deployment model. An ML system that uses an AGPL-licensed library may be required to open-source its own code. A system that uses a library with a non-commercial licence cannot be deployed commercially. FOSSA and Black Duck provide automated licence analysis and conflict detection. The pip-licenses tool provides a lightweight open-source way to enumerate all Python dependency licences for review. The Technical SME documents and retains the licence audit as Module 3 evidence.

Secret detection using tools such as git-secrets, TruffleHog, GitLeaks, or detect-secrets prevents credentials from entering the repository. The scanner runs as a pre-commit hook, catching secrets before they are committed, and as a CI pipeline step, catching secrets that bypassed the hook. The security team treats any committed secret as compromised and rotates it immediately, regardless of whether the commit was subsequently removed. Git history retains the secret even after deletion, making post-commit removal insufficient as a security control.

What is the procedural alternative for static analysis?

Compensating Controls

Static analysis can be partially replaced by manual code review, though some checks are impractical to perform by hand.

Static analysis can be partially replaced by manual code review, though some checks are impractical to perform by hand. A structured code review checklist should cover AI-specific concerns including demographic feature use, hardcoded thresholds, missing logging, and model registry bypass, as well as general code quality standards.

Before adding any new dependency, the developer checks the dependency's licence terms, vulnerability status via the public CVE database, and maintenance status. The reviewer explicitly checks for credentials, API keys, and other secrets in every pull request. Manual review catches issues in changed code but does not re-scan existing code for newly discovered vulnerabilities. Dependency vulnerability detection is reactive: the developer checks when adding a dependency, not when a new CVE is published against an existing one.

The tools required for automated static analysis are free and open-source. Ruff, pylint, mypy, pip-audit, detect-secrets, and Semgrep all have open-source editions with the only cost being configuration time. Organisations should implement automated static analysis from the outset, as the cost of configuration is negligible compared to the compliance risk of relying solely on manual review. Manual review catches issues in changed code but does not re-scan existing code for newly discovered vulnerabilities or newly added compliance rules. Automated scanning re-evaluates the entire codebase on every run, providing continuous compliance assurance that manual processes cannot match.

Static Analysis and Pre-Commit Hooks for AI Compliance

Written by

What categories of static analysis must the pipeline enforce?

What AI-specific custom rules should be implemented?

How do dependency scanning and licence compliance work?

What is the procedural alternative for static analysis?

Frequently Asked Questions

Related Pages

In This Section

Build compliance into your pipeline