How do you classify a recruitment AI system under the EU AI Act?

Confirm the AI definition under Article 3(1), clear all Article 5 prohibitions, assess Pathway A (Annex I) and Pathway B (Annex III), evaluate the Article 6(3) exception, and document the full reasoning in a Classification Decision Record.

What risk identification methods should a high-risk system use?

Five complementary methods: FMEA across system components, stakeholder consultation, regulatory gap analysis, adversarial red-teaming against the MITRE ATLAS matrix, and horizon scanning of regulatory and research developments.

Why might XGBoost be preferred over deep learning for a high-risk AI system?

XGBoost provides exact Shapley values via TreeExplainer within latency budgets, deterministic inference for logging and accuracy requirements, and competitive AUC-ROC without the explainability trade-offs of neural networks or the compliance complications of LLM-based approaches.

How do you prevent automation bias in a recruitment AI system?

Delayed score reveal (30 seconds), calibration case injection (1:20 ratio), review time monitoring, mandatory override justification, and aggregate override rate tracking with thresholds for investigation.

What does a conformity assessment look like for a recruitment screening AI?

Annex VI internal control with a five-step review workflow (technical, legal, data protection, holistic, final approval), non-conformity tracking and remediation, optional voluntary notified body review, and a 30-minute inspection readiness drill.

How much does AISDP preparation cost for a high-risk AI system?

Approximately EUR 325,000 initial cost over 22 weeks, with EUR 95,000 annual ongoing costs (about 18% of development cost), including internal effort, external audits, voluntary review, penetration testing, and translations.

Can a recruitment AI system claim the Article 6(3) exception to avoid high-risk classification?

It is unlikely for substantive screening systems. TalentLens Pro could not claim the exception because it evaluates and ranks candidates using 47 features, producing a substantive suitability assessment. Systems that merely sort applications by date or check for keyword presence may qualify, but any system that generates rankings or scores for human review is performing more than a narrow procedural task.

How should an organisation handle pre-existing AI systems that were built before the AI Act?

Meridian chose not to reconstruct retroactive documentation for the pre-Act version. Instead, they treated the first substantial modification (v4.0.0 model retraining) as the starting point for compliant documentation, clearly marking what was reconstructed versus contemporaneous. This transparency was more credible than retroactive documentation claiming to be original.

What happens when a deployer uses the system outside its intended purpose?

TalentLens Pro detected a deployer using the system for internal transfers through Level 3 (product management) oversight. The deployer was notified, the misuse was documented, and the Instructions for Use were clarified. Technical monitoring alone cannot detect intent drift; business-level oversight is essential.

How do you track whether cumulative model changes constitute a substantial modification?

Implement both version-to-version comparison (against the current production model) and version-to-baseline comparison (against the conformity assessment baseline). TalentLens Pro's cumulative tracking detected AUC-ROC drift of 0.041 approaching the 0.05 threshold, which would have been invisible with only version-to-version checks.

What should the end-of-life plan for a high-risk AI system include?

A phased six-month timeline from deployer announcement to full shutdown, credential revocation and infrastructure teardown, data lifecycle closure with special category deletion within 72 hours, downstream decision monitoring for 12 months tracking how historical outputs continue to affect individuals, and documentation archival with retrieval procedures.

Can a recruitment AI system claim the Article 6(3) exception to avoid high-risk classification?

It is unlikely for substantive screening systems. TalentLens Pro could not claim the exception because it evaluates and ranks candidates using 47 features, producing a substantive suitability assessment. Systems that merely sort applications by date or check for keyword presence may qualify, but any system that generates rankings or scores for human review is performing more than a narrow procedural task.

How should an organisation handle pre-existing AI systems that were built before the AI Act?

Meridian chose not to reconstruct retroactive documentation for the pre-Act version. Instead, they treated the first substantial modification (v4.0.0 model retraining) as the starting point for compliant documentation, clearly marking what was reconstructed versus contemporaneous. This transparency was more credible than retroactive documentation claiming to be original.

What happens when a deployer uses the system outside its intended purpose?

TalentLens Pro detected a deployer using the system for internal transfers through Level 3 (product management) oversight. The deployer was notified, the misuse was documented, and the Instructions for Use were clarified. Technical monitoring alone cannot detect intent drift; business-level oversight is essential.

How do you track whether cumulative model changes constitute a substantial modification?

Implement both version-to-version comparison (against the current production model) and version-to-baseline comparison (against the conformity assessment baseline). TalentLens Pro's cumulative tracking detected AUC-ROC drift of 0.041 approaching the 0.05 threshold, which would have been invisible with only version-to-version checks.

What should the end-of-life plan for a high-risk AI system include?

A phased six-month timeline from deployer announcement to full shutdown, credential revocation and infrastructure teardown, data lifecycle closure with special category deletion within 72 hours, downstream decision monitoring for 12 months tracking how historical outputs continue to affect individuals, and documentation archival with retrieval procedures.

Worked Example: TalentLens Pro (Classical ML Recruitment Screening)

Written by

Michael Clark

Chief Executive Officer, Standard Intelligence

Founder and CEO of Standard Intelligence. Author of the Practitioners Implementation Guide series for EU AI Act compliance.

Martin Dean

Chief Technology Officer, Standard Intelligence

CTO of Standard Intelligence. Leads platform engineering and contributes to the PIG series technical content.

TalentLens Pro is a high-risk AI recruitment screening system that ranks job candidates using an XGBoost ensemble model across 18 EU Member States. This worked example walks through the complete seven-phase AISDP preparation process, from classification and risk assessment through architecture, testing, conformity assessment, deployment, and post-market monitoring. Every governance gate, technical decision, and evidence artefact described in the Practitioners Implementation Guide is illustrated through a single end-to-end case study.

Abstract

Read abstract

This worked example demonstrates the complete AI System Documentation Package preparation process for TalentLens Pro v4.2.0, a high-risk recruitment screening system classified under Annex III, Area 4(a). The system uses an XGBoost ensemble model to produce suitability scores with SHAP-based explanations, processing 2.3 million screenings per year for 14 enterprise deployers. The example covers all seven phases of the delivery workflow: discovery and classification, risk assessment with five complementary methods producing a 10-risk register, architecture design across an eight-layer reference model, development with an 11-stage CI/CD pipeline enforcing fairness and robustness gates, conformity assessment via Annex VI internal control supplemented by voluntary TUV SUD review, EU database registration and multi-jurisdiction deployment, and ongoing post-market monitoring with nine quantified activities. It addresses key practical compliance challenges including bias assessment across 14 demographic subgroups, anti-automation-bias controls in the human oversight interface, cumulative substantial modification tracking, and end-of-life planning with downstream decision monitoring.

What is TalentLens Pro and why does it need an AISDP?

Engineering Approach

TalentLens Pro is a high-risk AI recruitment screening system that ranks job candidates using an XGBoost ensemble model, triggering full EU AI Act compliance obligations under Annex III, Area 4(a).

TalentLens Pro is a high-risk AI recruitment screening system that ranks job candidates using an XGBoost ensemble model, triggering full EU AI Act compliance obligations under Annex III, Area 4(a). The system is provided by Meridian AI Solutions GmbH, a Berlin-based company, and processes approximately 2.3 million screenings per year across 18 EU Member States. It produces suitability scores from 0 to 100 with SHAP-based explanatory profiles for each candidate.

The system is classified as high-risk because it falls within annex iii Area 4(a): AI systems intended for recruitment or selection of natural persons. The article 6 3 exception does not apply because TalentLens Pro does not perform a narrow procedural task. It evaluates and ranks candidates using 47 features, producing a substantive suitability assessment that structures the recruiter's review. The aisdp reference is AISDP-2025-0042-v3.1, registered in the EU database as EU-AI-HR-2025-00784.

This worked example follows the complete seven-phase AISDP preparation process from discovery through post-market operations. It demonstrates how the engineering practices described throughout the Practitioners Implementation Guide translate into concrete artefacts, governance gates, and compliance evidence for a real-world system. The domain-to-module mapping table below shows how each guide section was applied.

Domain Section	Application to TalentLens Pro	AISDP Module(s)
Risk Assessment	Five-method risk identification; 10-risk register; four-dimension scoring	6, 11
Model Selection	XGBoost selection over deep learning and LLM alternatives; SHAP rationale	2, 3
Data Governance	326k-record dataset; three-source acquisition; 14-subgroup bias assessment	4
Development Architectures	Three-microservice architecture; eight-layer reference; C4 diagrams	3, 7
Version Control	Git monorepo with DVC; MLflow registry; composite versioning scheme	10, 12
CI/CD Pipelines	11-stage pipeline with fairness, robustness, and modification gates	2, 5, 9, 10
Cybersecurity	ISO 27001; 13 AI threat categories; adversarial testing; CRA scope	9
Conformity Assessment	Annex VI internal control with voluntary TUV SUD review	Cross-cutting
Post-Market Monitoring	Nine monitoring activities with quantified thresholds; feedback loop	12
Operational Oversight	Six-level oversight pyramid; break-glass; anti-automation-bias features	7

How was TalentLens Pro classified as high-risk?

Regulatory Requirement

Classification required confirming the system meets the AI system definition under Article 3(1), clearing all Article 5 prohibitions, and positively identifying the applicable Annex III area.

Classification required confirming the system meets the AI system definition under Article 3(1), clearing all Article 5 prohibitions, and positively identifying the applicable Annex III area. The AI System Assessor assembled an evidence pack of 47 documents including the product specification, existing risk assessments, deployer DPIAs, and interview transcripts.

The article 3 1 definition was confirmed: TalentLens Pro is machine-based (XGBoost on AWS SageMaker), operates with autonomy (produces scores without per-prediction human intervention), and generates outputs influencing decisions affecting natural persons. All eight Article 5 prohibitions were cleared. The system does not engage in subliminal manipulation, exploitation of vulnerable groups, social scoring, or biometric identification.

Pathway A (Annex I safety component) did not apply. Pathway B identified annex iii Area 4(a) as the applicable classification: AI systems intended for recruitment or selection of natural persons. The Legal and Regulatory Advisor assessed the Article 6(3) exception and determined it did not apply on three grounds. First, the system does not perform a narrow procedural task. Second, it does not merely improve a previously completed human activity. Third, it does not simply detect decision-making patterns without replacing human assessment.

The Classification Decision Record was reviewed against five dimensions (completeness, accuracy, reasoning, legal correctness, proportionality) and approved without amendment. The seven-member assessment team completed conflict of interest declarations. One assessor's prior consulting relationship with a deployer was managed by restricting access to that deployer's configuration data.

How were risks identified and assessed?

Engineering Approach

Five complementary methods produced 10 consolidated risks, each scored across four impact dimensions using a five-by-five likelihood-severity matrix.

Five complementary methods produced 10 consolidated risks, each scored across four impact dimensions using a five-by-five likelihood-severity matrix. The methods were FMEA, stakeholder consultation, regulatory gap analysis, adversarial red-teaming, and horizon scanning.

The FMEA identified 34 failure modes across six components. Notable high-RPN entries included: feature engineering computing values outside the training distribution for novel job categories (RPN 36), SHAP generating misleading attributions when feature interactions dominate (RPN 36), and adversarial CV crafting producing inflated scores (RPN 30). All failure modes with RPN above 24 were escalated to the risk register.

Stakeholder consultation engaged nine participants including deployer HR directors, a labour rights advocate, an accessibility specialist, and a former job applicant. The labour rights advocate raised concerns about disadvantaging candidates with non-linear career paths. The accessibility specialist identified risks for candidates using assistive technology. Both concerns entered the risk register with assigned mitigations.

Adversarial red-teaming against the MITRE ATLAS matrix demonstrated an 8-point score inflation through adversarial CV crafting, with a 94% detection rate. Model extraction was impractical given rate limiting, and membership inference was not feasible for the XGBoost architecture. Horizon scanning reviewed OECD, Stanford HAI, and AI Incident Database sources, identifying three relevant developments including a study on gender bias amplification in resume screening.

Why was XGBoost selected over deep learning and LLM alternatives?

Engineering Approach

XGBoost was selected because it achieved the best balance of accuracy, explainability, and compliance suitability among five evaluated architectures.

XGBoost was selected because it achieved the best balance of accuracy, explainability, and compliance suitability among five evaluated architectures. The Model Selection Record documented the full evaluation rationale, including an Article 25 provider status analysis of why an LLM-based approach was rejected.

Five architectures were evaluated. Logistic regression achieved 0.791 AUC-ROC but the accuracy gap would have incorrectly ranked thousands of candidates monthly. Random forest achieved 0.832 but SHAP computation was too expensive for per-prediction explanations at inference time. XGBoost v1.7.6 achieved 0.847 with exact Shapley values via TreeExplainer within the latency budget, plus deterministic inference supporting Article 12 logging and Article 15 accuracy requirements.

A deep neural network achieved 0.851, a marginal improvement of 0.004 that did not justify the explainability degradation for a high-risk system requiring meaningful per-decision explanations under Article 14. A fine-tuned GPT-4 CV analyser was rejected on multiple grounds. The Legal and Regulatory Advisor determined that fine-tuning engaged Article 25(1)(b), making Meridian a provider with full obligations. Stochastic output variation created challenges for accuracy and logging requirements. Copyright and training data provenance risks were assessed as high given ongoing litigation.

The architecture maps to an eight-layer reference model implemented across three microservices. The Data Ingestion Module handles schema validation, prohibited feature blocking, and data minimisation. The Scoring Engine handles model inference, SHAP computation, and confidence thresholding. The Employer Reporting Interface handles human oversight, delayed score reveal, and calibration case injection.

How does the human oversight interface prevent automation bias?

Compensating Controls

The Employer Reporting Interface implements five anti-automation-bias measures that go well beyond a simple override button, addressing the risk that recruiters accept AI scores without independent clinical judgement. These controls were redesigned during Phase 3 after the risk assessment for R-002 revealed that a basic score display would be insufficient.

Delayed score reveal withholds the suitability score for 30 seconds. During this period, the recruiter sees the candidate profile and must form an independent impression before the AI assessment appears. This is a technical control enforced at the interface level, not a policy recommendation.

Calibration cases are injected at a rate of 1 in 20: cases where the system's recommendation is known to be incorrect, testing whether the recruiter identifies the error independently. Review time monitoring alerts the deployer administrator when the median review time per recruiter falls below 45 seconds. Override capability allows any score to be overridden with a mandatory free-text justification.

The break-glass design provides two independent halt mechanisms. An in-application stop button propagates through the feature flag system within 200ms. A separate Lambda function in a different AWS account scales the inference endpoint to zero. Annual exercises verify both mechanisms. The first exercise found one deficiency: the deployer notification email lacked an expected restart timeline, which was remediated within 48 hours.

The six-level pyramid ranges from Level 1 (two SREs with emergency rollback authority) through Level 6 (external oversight, including the AESIA sandbox engagement). The aggregate override rate in Q2 2025 was 11.3%, with no deployer falling below the 2% threshold that would trigger an automation bias investigation.

How was data governance implemented for 326,000 training records?

Engineering Approach

The training dataset was assembled from three sources with full provenance documentation, bias assessment across 14 subgroups, and Article 10(5) safeguards for special category data processing.

The training dataset was assembled from three sources with full provenance documentation, bias assessment across 14 subgroups, and Article 10(5) safeguards for special category data processing. Each source was documented using the Gebru et al. datasheet framework.

Anonymised historical recruitment data from 14 enterprise deployers contributed 248,000 records (January 2019 to June 2024). A synthetic augmentation dataset generated using CTGAN addressed under-representation in specific subgroups, contributing 52,000 records. A validated benchmark dataset from the Technical University of Munich contributed 26,000 records. The data spanned 18 EU Member States and 12 languages.

The bias assessment used Fairlearn's MetricFrame to evaluate 14 subgroups across seven dimensions (gender, age band, ethnicity, disability status, nationality grouping, language, highest qualification origin). Post-mitigation selection rate ratios ranged from 0.89 to 0.96. Intersectional analysis produced 148 cells, with 23 falling below the 50-candidate minimum cell size. The female-over-50-non-EU-qualification intersection at 0.87 was identified for targeted improvement. FairML Consulting GmbH conducted an independent audit confirming the findings.

Article 10(5) special category processing used pseudonymisation at collection, purpose limitation through technical controls (logically separated database with access restricted to three named individuals), and automatic deletion within 72 hours of aggregate metric computation. Data version control used DVC with S3 storage, with every training run's DVC reference recorded in MLflow to establish complete provenance from model version to raw data.

What does the 11-stage CI/CD pipeline enforce?

Engineering Approach

The pipeline enforces compliance as a technical gate, not a documentation exercise, by blocking deployment unless every quality, fairness, robustness, and modification threshold is met.

The pipeline enforces compliance as a technical gate, not a documentation exercise, by blocking deployment unless every quality, fairness, robustness, and modification threshold is met. No model reaches production without passing all 11 stages.

Stages 1 through 4 cover code quality: checkout with SBOM generation via CycloneDX, static analysis (Ruff, Bandit, pip-audit), 847 unit tests at 85% coverage, and integration tests using a 500-record fixture spanning all 14 subgroups. Stage 5 validates training data against 23 Great Expectations rules covering types, ranges, null rates, and distribution bounds.

Stage 6 trains the model. Stage 7 enforces the performance gate: AUC-ROC must reach 0.82, precision 0.75, and recall 0.70. Stage 8 enforces the fairness gate: every subgroup selection rate ratio must reach 0.80 (hard floor), with 0.90 generating a warning. Stage 9 enforces the robustness gate: maximum adversarial score inflation must not exceed 10 points, and out-of-distribution detection must flag 90% of synthetic OOD inputs.

Stage 10 detects substantial modifications through two comparisons. Version-to-version comparison checks against the current production model (thresholds include AUC-ROC shift exceeding 0.03 and any subgroup SRR below 0.80). Version-to-baseline comparison checks against the conformity assessment baseline (thresholds include cumulative AUC-ROC drift exceeding 0.05). Stage 11 generates documentation automatically, updating AISDP Modules 3, 4, 5, and 9 with current values.

Human override requires the AI Governance Lead's written approval with a logged justification. The posture includes ISO 27001 certification, assessment of all 13 AI threat categories, and annual penetration testing by NCC Group. The CRA scope determination treated TalentLens Pro as within scope given evolving Commission interpretation of the SaaS boundary.

How was conformity assessment conducted and what non-conformities were found?

Regulatory Requirement

Conformity assessment followed the Annex VI internal control procedure, supplemented by a voluntary TUV SUD review, with a five-step review workflow that identified and remediated three non-conformities before the Declaration was signed.

The five-step review took 12 working days. The Technical Review (4 days) found one minor non-conformity: an architecture diagram showed Redis connected to the wrong microservice. The Legal Review (3 days) found one major non-conformity (candidate notification template missing in 3 of 12 languages) and one minor (a cross-reference error). The Data Protection Review (2 days) found no issues. The Holistic Review (3 days) confirmed all 26 Annex IV completeness items.

The major non-conformity was remediated within 21 days through a certified translation agency with AI domain expertise. The Declaration of Conformity was prepared per Annex V in machine-readable (JSON-LD) and signed PDF formats, translated into all 24 official EU languages, and signed on 1 April 2025. EU database registration was confirmed as EU-AI-HR-2025-00784 before the system was available to new deployers.

The 30-minute inspection readiness drill was conducted with two internal mock inspectors unfamiliar with the AISDP. All four document requests were fulfilled within the benchmark: design specifications in 8 minutes, fairness testing evidence in 4 minutes, the risk register in 2 minutes, and a live decision reconstruction in 18 minutes. CE marking was affixed in three locations: the user interface, API response headers, and the Instructions for Use.

How does post-market monitoring maintain ongoing compliance?

Engineering Approach

Nine monitoring activities with quantified thresholds and escalation procedures form the backbone of continuous compliance, supported by a six-level oversight pyramid and a feedback loop with tiered decision authority.

Activity	Metric	Threshold	Frequency
Performance	AUC-ROC on labelled production data	Below 0.80	Monthly
Fairness	Selection rate ratio per subgroup	Below 0.85 (warning); below 0.80 (critical)	Monthly
Data drift	Population Stability Index	Above 0.20	Weekly
Human oversight	Override rate per deployer	Below 2% or above 40%

What were the costs, end-of-life provisions, and lessons learned?

Engineering Approach

Total initial preparation cost approximately EUR 325,000 over 22 weeks, with ongoing annual costs of approximately EUR 95,000, representing about 18% of annual development cost.

Total initial preparation cost approximately EUR 325,000 over 22 weeks, with ongoing annual costs of approximately EUR 95,000, representing about 18% of annual development cost. The largest internal effort allocation was the AI System Assessor at 0.6 FTE for the full duration.

Direct costs included internal effort (EUR 210,000), the FairML bias audit (EUR 25,000), the TUV SUD voluntary review (EUR 35,000), NCC Group penetration testing (EUR 18,000), and translation services (EUR 22,000). Shared infrastructure costs across Meridian's three high-risk systems were approximately EUR 40,000 per year. The 18% ongoing cost ratio falls within the 15% to 25% range typical for high-risk AI compliance programmes.

The decommissioning plan was documented during Phase 3, covering a six-month planned retirement with technical shutdown, data lifecycle closure, and downstream decision monitoring. The timeline runs from T minus 6 months (deployer announcement) through full shutdown at T-0, with credential revocation via HashiCorp Vault's parent lease mechanism and infrastructure teardown via Terraform. Special category data is deleted within 72 hours, while aggregated monitoring data is retained for ten years. A 12-month post-decommission plan tracks aggregate hiring outcomes for the final cohort, disaggregated by subgroup, acknowledging that historical AI scores continue to affect individuals.

Five key lessons emerged. First, retrospective documentation is less credible than contemporaneous evidence. Meridian chose to start compliant documentation from v4.0.0 rather than reconstructing v3.x history, clearly marking the boundary. Second, requires operational design, not a checkbox. The initial score-and-override design was replaced with delayed reveal, calibration injection, and review time monitoring after the R-002 risk assessment.

ID	Risk	Residual	Key Mitigation
R-001	Discriminatory scoring against protected subgroups	Medium	14-subgroup testing; SRR threshold 0.80; external audit
R-002	Automation bias: recruiters accept scores without review	Medium	Delayed score reveal; calibration cases (1:20); review time monitoring
R-003	Model drift degrading accuracy or fairness	Low	PSI monitoring; quarterly revalidation; cumulative baseline tracking
R-004	Adversarial CV crafting inflating scores	Medium	Input validation; 94% detection rate; annual red-teaming
R-005	Adverse impact on non-standard CVs	Medium	Feature audit for proxy variables; synthetic augmentation

Worked Example: TalentLens Pro (Classical ML Recruitment Screening)

Written by

What is TalentLens Pro and why does it need an AISDP?

How was TalentLens Pro classified as high-risk?

How were risks identified and assessed?

Why was XGBoost selected over deep learning and LLM alternatives?

How does the human oversight interface prevent automation bias?

How was data governance implemented for 326,000 training records?

What does the 11-stage CI/CD pipeline enforce?

How was conformity assessment conducted and what non-conformities were found?

How does post-market monitoring maintain ongoing compliance?

What were the costs, end-of-life provisions, and lessons learned?

Frequently Asked Questions

Related Pages

Navigate the regulatory landscape