What does EU AI Act Article 10 require?

Article 10 requires providers of high-risk AI systems to implement data governance practices for training, validation, and test datasets. Requirements include: relevance and representativeness relative to the intended deployment population, appropriate statistical properties including proportionate group representation, examination for bias before placing the system on the EU market, labeling methodology documentation, and written data governance practices covering collection, preprocessing, versioning, and access controls. Article 10 documentation must exist before market placement and must be updated after substantial data modifications.

What must Article 10(3) bias examination documentation include?

Article 10(3) bias examination documentation must evidence the examination process, not just its outcome. Required content: (1) what bias detection methods were applied; (2) pre-mitigation findings — what biases were found before any correction; (3) mitigation measures applied to address identified biases; (4) post-mitigation results showing outcomes after mitigation; and (5) residual bias assessment with justification if any group still fails thresholds. Documentation assembled retroactively after an enforcement inquiry has significantly reduced evidentiary value — examination must be conducted and documented before deployment.

Does Article 10 apply to systems using third-party foundation models?

Yes. Article 10 obligations apply to the complete high-risk AI system, including any foundation model components. For systems using third-party models like GPT-4, Claude, or Gemini, providers must document what the foundation model provider discloses (model cards, data governance statements), conduct black-box bias testing on model outputs even when training data is inaccessible, fully document any fine-tuning data they control, and establish behavioral baselines at deployment to detect drift after model provider updates. A foundation model version update that changes output behavior triggers Article 10(3) re-examination obligations.

When does Article 10 require re-examination?

Article 10 re-examination is triggered by: (1) substantial modification to training data that materially affects data distribution or demographics; (2) foundation model version updates that change output behavior; (3) post-market monitoring (Article 72) that detects demographic performance divergence; (4) deployment in a new geography with a different population demographic profile; and (5) a serious incident report (Article 73) involving potential bias. Organizations with decision-level behavioral baselines at deployment can detect when these triggers have been met and define the scope of re-examination required.

EU AI Act Article 10: Data and Data Governance for High-Risk AI Systems

EU AI Act Article 10 requires providers of high-risk AI systems to satisfy specific data governance obligations before placing their system on the EU market. Requirements include: training data representativeness relative to the deployment population, bias examination before deployment documenting both pre-mitigation findings and residual bias, labeling methodology documentation, and data governance practices covering collection, preprocessing, versioning, and access controls. Foundation model providers must be documented as data components. Behavioral baselines at deployment satisfy the post-deployment monitoring arm of Article 10(4).

Article 10(2): Training Data Requirements

Article 10(2) requires training, validation, and test datasets to be relevant and representative relative to the intended purpose. Representativeness requires demographic completeness — the data distribution must reflect the actual population and situations the system will encounter in deployment. Article 10(2)(c) explicitly requires appropriate statistical properties including proportionate representation of persons or groups. This is the legislative basis for requiring demographic disaggregation in data documentation — not just performance metrics but the composition of the data itself. Providers also must document labeling methodology (labeler instructions, quality controls, inter-rater reliability scores) and collection methodology.

Article 10(3): Bias Examination Before Market Placement

Article 10(3) requires providers to examine datasets for biases that are likely to affect health and safety or cause prohibited discrimination before placing the system on the EU market. The examination must cover the complete data pipeline: collection methodology, labeling procedures, preprocessing transformations, and final dataset composition. Bias examination documentation must record what bias detection methods were applied, what results were found, what mitigation measures were taken, and the residual bias present after mitigation. Post-mitigation results alone are insufficient — the process must be evidenced. Standard quantitative methods include demographic parity analysis (selection rates by protected attribute), equalized odds testing (error rates by protected attribute), and counterfactual fairness testing where applicable.

Article 10(4): Data Governance Practices

Article 10(4) requires written data governance practices covering the entire data pipeline: collection, preprocessing, versioning, access controls, and quality control. Dataset versioning is required — every dataset used for training, validation, or testing must be identifiable by version. Access audit trails must record who accessed datasets, when, and what operations were performed. Provenance documentation must trace each data source to its origin and the legal basis for collection. For systems using third-party datasets or foundation model providers, Article 10(4) requires documentation of the third party's data governance practices — not just a reference to their terms of service.

Article 10 for Foundation Model-Based Systems

Most enterprise AI systems use foundation models (GPT-4, Claude, Gemini, Llama) as components. Article 10 obligations apply to the complete system including foundation model components. Since providers cannot fully document training data they did not collect, the framework requires: documenting what foundation model providers disclose (model cards, data governance statements), conducting black-box bias testing on model outputs even when training data is inaccessible, fully documenting any fine-tuning data, and establishing behavioral baselines at deployment to detect behavioral drift after model provider updates. Foundation model version updates that change output behavior trigger Article 10(3) re-examination obligations.