How to Capture Human Overrides of AI Agent Decisions for Fine-Tuning
Human override records are the highest-signal training data available for AI agents. When a loan officer rejects an AI approval, a clinical reviewer modifies a prior auth recommendation, or a compliance analyst reverses an automated flag, that override encodes production context, correct behavior, and edge case handling that RLHF and synthetic datasets cannot replicate. This guide shows how to capture these overrides with tenet.record_override(), structure them as DPO preference pairs, and use override frequency patterns as a semantic drift detection signal — while satisfying EU AI Act Article 14 documentation requirements.
Why Human Overrides Are Your Best Training Data
Standard training pipelines use preference data from human labelers rating synthetic completions. Override records are different: they capture a subject-matter expert making a real production decision, with real consequences, on a case the AI got wrong. Override data has four properties synthetic data lacks: production context (the real input state that triggered the wrong decision), expert demonstration (the correct output from someone accountable for the outcome), failure mode diversity (overrides cluster around the AI's actual blind spots), and regulatory anchoring (in regulated industries, overrides often reflect policy constraints the AI misapplied).
EU AI Act Article 14: Capture Is Required, Not Optional
EU AI Act Article 14 requires high-risk AI systems to enable human oversight and to capture when humans intervene in or override AI decisions. Specifically, Article 14(4) requires systems to allow humans to decide not to use an AI system output, and Article 14(5) requires logging when human oversight is applied. This means: for in-scope systems, override capture is a compliance obligation — not a training optimization. A record must include the actor ID, timestamp, original AI decision, override decision, and reason for the change.
Override Record Schema
A complete override record contains nine fields: session_id (links the override to the originating AI decision), actor_id (pseudonymized identifier of the human reviewer), timestamp (ISO 8601 with timezone), original_decision (the AI output being overridden), override_decision (the human's replacement decision), reason_category (enum: POLICY_EXCEPTION, FACTUAL_ERROR, EDGE_CASE, REVIEWER_ERROR, NEW_INFORMATION), reason_text (free-text justification, optional), confidence (reviewer's stated certainty 0.0-1.0), and outcome (downstream result if tracked). This schema maps directly to a DPO preference pair: the AI decision is the rejected completion, the override decision is the chosen completion.
Implementation: record_override() and record_confirmation()
Install pip install tenet-ai-sdk. Initialize TenetClient with your API key. For override capture, call tenet.record_override() with the session_id from the original AI decision record, actor details, original and override decisions, and a reason from the OverrideReason enum. For confirmation (human approves the AI decision unchanged), call tenet.record_confirmation() — these records are equally valuable: they identify the cases where the AI was right on hard examples. Both calls use the same fire-and-forget Ghost SDK architecture — under 0.3ms blocking overhead.
Exporting DPO Training Data
DPO (Direct Preference Optimization) fine-tuning requires preference pairs: chosen completion vs rejected completion, each with the original prompt context. Use Tenet's export API to retrieve override records filtered by confidence threshold (≥0.7 recommended), reason category (exclude REVIEWER_ERROR), and date range. Each record maps to a DPO pair: the context snapshot is the prompt, the override_decision is chosen, the original_decision is rejected. For RLHF, the same records can be used as reward signal: overrides are negative examples, confirmations are positive.
Override Patterns as Drift Detection Signal
Override frequency by decision category is a leading indicator of semantic drift. If your loan agent's override rate on DTI borderline cases increases from 3% to 8% over 30 days, the agent's decision boundary has shifted — before any eval regression is visible. Track: override rate by decision type, override rate by reason category (policy exception rate rising = agent is misapplying policy), confirmation rate on edge cases (decreasing = agent is degrading on hard examples), and actor disagreement rate (multiple reviewers overriding the same decision = consistent AI failure mode).