Semantic Drift Detection for AI Agents — Tenet AI

Semantic drift is when an AI agent starts making systematically different business decisions without any change to model version, code, or evaluation benchmark scores. Standard monitoring shows green. Tenet's Verification Replay detects it by re-executing past production decisions against the current agent state and comparing reasoning chains at the token level.

What Is Semantic Drift?

Semantic drift happens at the reasoning layer — the agent processes the same inputs differently over time. Unlike statistical model drift (detectable via PSI scores) or code drift (tracked in version control), semantic drift produces no observable signal in LangSmith, LangFuse, Arize, or Datadog. Evals show stable accuracy. Infrastructure metrics are normal. But the agent's decisions have changed.

How Tenet Detects Semantic Drift

Tenet's Verification Replay re-executes any past production decision from the Reasoning Ledger against the current agent state, using the exact context snapshot captured at the original decision time. The Semantic Diff identifies exactly where the reasoning chain diverged — which premise changed, which context weight shifted, which intermediate conclusion diverged first. Covers bugs, errors, glitches, behavioral anomalies, and drift.

Real-World Causes of Semantic Drift in Production

Context window changes: when the context sent to an AI agent changes — due to data pipeline updates, feature engineering changes, or upstream model updates — the agent may produce different reasoning even on identical scenarios. Prompt template drift: small changes to system prompts can significantly shift agent behavior without triggering any monitoring alert. Model fine-tuning: even targeted fine-tuning can alter behavior in adjacent decision domains that were not part of the training objective.

Semantic Drift vs Model Drift: Why Standard Monitoring Misses It

Model drift detectors measure aggregate statistical distributions — PSI scores, RMSE delta, accuracy benchmarks. These aggregate metrics can remain stable while individual decision reasoning has fundamentally changed. A fraud agent that now misses one specific pattern of fraud while maintaining overall accuracy shows stable model drift metrics — but is making different decisions on a specific class of input. Only decision-level replay can detect this class of behavioral change.

How to Set Up Drift Detection with Tenet

Tenet captures a Reasoning Ledger record for every agent decision with the Ghost SDK (2-line integration). Drift detection runs on a configurable schedule: replay last N decisions against the current agent state, compare reasoning chains, flag divergences above a configurable threshold. Alert routing integrates with Slack, PagerDuty, or any webhook. First drift report available within hours of SDK integration.