What is EU AI Act Annex IV?

Annex IV of the EU AI Act specifies the technical documentation that providers of high-risk AI systems must maintain before placing their system on the EU market. It contains eight categories of required content covering system description, training methodology and datasets, monitoring and human oversight measures, performance metrics, testing procedures and results, applicable standards, the EU Declaration of Conformity, and the post-market monitoring system. Annex IV documentation is not submitted to a regulator — it is maintained by the provider and must be made available to notified bodies and market surveillance authorities on request.

When must Annex IV documentation be complete?

Annex IV documentation must be complete before the high-risk AI system is placed on the EU market. For systems that will be subject to mandatory third-party conformity assessment (certain Annex III categories including biometric identification, critical infrastructure, and employment AI), the documentation must be provided to the notified body before assessment begins. For systems using the internal control procedure (self-assessment), the documentation must be complete before the EU Declaration of Conformity is signed. The obligation date for most Annex III high-risk AI systems is August 2, 2026 — systems already deployed before that date must achieve compliance by August 2, 2027.

What must Annex IV section 2 include about training data?

Annex IV § 2 requires documentation of the training methodology, training dataset specification (source, size, collection period, preprocessing), demographic distribution of the training data (especially where the system makes decisions about people — gaps in demographic representation are a known risk factor), labeling methodology (who labeled data, instructions given, quality controls), and known limitations of the training dataset. For pre-trained or third-party AI models used as components, the documentation must identify the provider, model version, what capabilities the third-party model provides, and whether the GPAI model provider has fulfilled their EU AI Act Article 53 documentation obligations.

What logging does Annex IV § 3 require?

Annex IV § 3 requires documentation of the automatic logging capability that satisfies Article 12 requirements: what events are logged (at minimum: inputs, outputs, timestamps, and identifiers for each decision affecting an individual), the log format and what each field contains, the retention period (Article 12 requires at minimum 6 months from placing on market, but sector-specific regulations such as HIPAA or ECOA may require longer), tamper-evidence controls ensuring logs cannot be modified retroactively, and the procedure for making logs available to market surveillance authorities within the required timeframe. Generic application-level logs do not satisfy Article 12 or Annex IV § 3(c) — per-decision records are required.

What is a substantial modification that requires Annex IV update?

A substantial modification is any change that affects the AI system's compliance with EU AI Act requirements or the performance characteristics the system was assessed for. For AI systems using LLM APIs, this includes: switching to a different model provider, upgrading to a new model version that changes reasoning behavior (even if the provider does not flag it as a breaking change), changing the system's intended purpose or use case, modifying the scoring or decision logic in ways that affect output distributions, and changes to training data that alter demographic performance patterns. Substantially modified systems must update their Annex IV documentation and, where required by Annex VI, undergo a new conformity assessment before the modified system is placed on the market.

What are the most common Annex IV documentation gaps?

The most frequent Annex IV gaps identified in compliance readiness assessments are: (1) missing third-party model documentation — providers document their own system but not the foundation model APIs that power it; (2) no demographic disaggregation — performance metrics without breakdown by sex, race/ethnicity, or other protected attributes; (3) inadequate logging specification — error logs but no per-decision records linking inputs and outputs for individual affected persons; (4) no behavioral baselines — post-market monitoring cannot detect drift without documented expected behavior at deployment; (5) human oversight described in policy but not evidenced technically — no documentation of override capability or actual override rates; and (6) version control gaps — documentation reflects the system at deployment but model provider updates are not tracked as triggering an Annex IV review.

EU AI Act Annex IV: Technical Documentation Requirements for High-Risk AI Systems

EU AI Act Article 11 requires providers of high-risk AI systems to maintain Annex IV technical documentation before placing the system on the EU market. Annex IV specifies eight categories of required content. This guide explains each section and what evidence auditors, notified bodies, and market surveillance authorities actually check. Annex IV documentation must remain current through all system updates — a substantial modification requires documentation update and potentially a new conformity assessment.

Annex IV § 1 and § 2: System Description and Development

Section 1 requires general description: intended purpose and deployment context (be specific — "score loan applications for credit risk" not "assist with lending"), software version with version history, hardware specifications, and all external systems and APIs the AI system interacts with. The most common § 1 gap: third-party model documentation. When a high-risk AI system uses an LLM API, the provider must document the foundation model as a component — including its version, provider, and capabilities. Section 2 requires the AI system architecture (components, reasoning approach, decision logic), training methodology, dataset specifications with demographic distribution, labeling methodology, and documentation of all pre-trained or third-party models used. For GPAI models: document whether the model provider has fulfilled their Article 53 obligations and what technical documentation they provided.

Annex IV § 3: Monitoring, Human Oversight, and Logging

Section 3 documents how the AI system is monitored after deployment and how Article 14 human oversight is operationalized. Required: description of oversight interfaces and tools, documentation that designated persons have authority to stop or override the AI system, escalation procedures, training requirements for oversight staff, and instructions to deployers. The logging sub-section requires: description of what events are logged, the log format and what each field contains, retention period (minimum 6 months under Article 12 but sector-specific requirements are often longer), tamper-evidence controls ensuring logs cannot be modified retroactively, and the process for making logs available to market surveillance authorities. The most common § 3 gap: application-level error logs only — no per-decision records linking inputs, reasoning, and outcomes for individual affected persons.

Annex IV § 4 and § 5: Performance Metrics and Testing

Section 4 requires justification for the chosen performance metrics given the system's specific task and risk profile — not just "95% accuracy" but why 95% is acceptable given who the errors affect and what happens to them. Demographic disaggregation is required: precision, recall, and error rates broken down by sex, race/ethnicity, and other protected attributes relevant to the use case. Section 5 requires test dataset specification, validation methodology, bias testing results (disparate impact analysis with selection rates by protected category), adversarial robustness testing, and complete test logs with individual results tied to specific system versions. The most common § 5 gap: no demographic disaggregation — overall accuracy metrics without any analysis of how performance varies across protected groups.

Annex IV § 6–8: Standards, Conformity, and Post-Market Monitoring

Section 6 lists harmonized EU standards applied (once published) or other frameworks such as ISO 42001 or NIST AI RMF. Section 7 is the EU Declaration of Conformity under Article 47 — a signed provider declaration affirming compliance, identifying the Annex III category, referencing the technical documentation, and identifying the notified body if third-party assessment was required. Section 8 is the post-market monitoring plan under Article 72: KPIs tracked after deployment, thresholds that trigger investigation, incident reporting triggers for Article 73 serious incident notifications, and the process for updating Annex IV when behavioral monitoring identifies material changes. The most common § 8 gap: no defined behavioral baselines — without documented expected behavior at deployment, drift cannot be detected and the post-market monitoring requirement cannot be satisfied.