Semantic Drift in AI Agents: The Silent Failure Mode That Breaks Production
Semantic drift is when an AI agent starts making systematically different business decisions without any change to the model version, code, or evaluation benchmark scores. Standard monitoring tools show green while the agent's reasoning logic quietly shifts. The only reliable detection mechanism is replaying past decisions against the current agent state and comparing reasoning chains.
What Is Semantic Drift?
Semantic drift happens when an agent's reasoning process shifts while all observable artifacts remain constant — same model accuracy, same model version, same code, same evals, but different decisions. Unlike statistical model drift (measurable via PSI scores) or code drift (tracked in version control), semantic drift produces no observable signal in standard monitoring tools.
Why Standard Monitoring Misses Semantic Drift
LangSmith captures traces but cannot compare reasoning logic across time. LangFuse runs evals on criteria you define in advance — but drift is the information you're trying to discover. Datadog sees infrastructure, not decision logic. Arize detects aggregate distribution changes, not individual reasoning chain divergence. If semantic drift produces identical accuracy, trace shape, and infrastructure metrics, none of these tools will fire.
How to Detect Semantic Drift: Verification Replay
Tenet's Verification Replay re-executes any past production decision against the current agent state using the original context snapshot stored in the Reasoning Ledger. The Semantic Diff output identifies exactly which reasoning step diverged — from which premise, with which context weight change — and how many production decisions were affected.