SR 11-7 Model Risk Management for LLM Agents: What Fed and OCC Guidance Requires
SR 11-7 (Supervisory Guidance on Model Risk Management), issued by the Federal Reserve and OCC in 2011, defines "model" broadly enough to include LLM-based agents used in financial institution decision-making. Any quantitative method — including an LLM — that produces outputs used to inform decisions falls within scope when those outputs have consequential impact on credit, risk, or financial outcomes. This guide maps SR 11-7 development documentation, independent validation, governance and controls, and ongoing monitoring requirements to LLM agent deployments at banks, credit unions, and financial holding companies.
Does SR 11-7 Apply to LLM Agents?
SR 11-7 defines a model as "a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates." The guidance explicitly notes that simpler statistical models are covered, but so are "systems in which the inputs are transformed by algorithms." An LLM agent that processes borrower information and produces credit recommendations — or processes market data and produces trading signals — fits this definition. Fed and OCC examiners have increasingly cited LLM-based decision tools in model risk discussions. Banks are expected to have a model inventory that includes LLM agents used in consequential decision-making. Treating LLM agents as software (not models) leaves them outside MRM governance and creates examination risk.
SR 11-7 Development Documentation for LLM Agents
SR 11-7 Section II requires comprehensive development documentation covering purpose and design rationale, data used in development, testing and validation results, and known limitations. For LLM agents, this means documenting the purpose and decision scope (what decisions the agent is authorized to make and the boundaries of that authority), foundation model selection rationale (why this provider and model version, what alternatives were evaluated, what testing determined fitness), prompt architecture and system prompt (treated as model design documentation — every material prompt change is a model change requiring re-documentation), training and fine-tuning data if applicable, and known failure modes and limitations. Documentation must exist at deployment and be updated on material changes — a model provider update that changes agent behavior is a material change requiring updated documentation.
Independent Validation Requirements for LLM Agents
SR 11-7 Section III requires independent validation performed by qualified personnel who are not responsible for model development. Validation must assess conceptual soundness (does the model do what it claims to do), data integrity and fitness (is the data appropriate for the intended use), and outcomes analysis (does the model perform as expected against measurable outcomes). For LLM agents, independent validation must test behavioral baselines against predefined performance thresholds using representative production-like data, evaluate sensitivity to input variations (prompt injection, edge cases, adversarial inputs), assess reasoning consistency across similar inputs, and document residual risks with compensating controls. Validation is ongoing — not a one-time gate. SR 11-7 requires re-validation when material changes occur, which for LLM agents includes model provider updates, material prompt changes, and deployment context expansions.
LLM Agent Behavioral Drift and SR 11-7
SR 11-7 ongoing monitoring requirements require tracking model performance against established metrics over time. LLM agents face three drift patterns not present in traditional statistical models. Foundation model provider drift: model providers periodically update base models with behavioral changes — a model version pinned to "gpt-4o" today may produce different outputs after a provider update, without the consuming organization taking any action. Reasoning path drift: even with a fixed model version, agent reasoning patterns can shift due to changes in context, tool outputs, or cumulative interaction patterns. Input population shift: the distribution of inputs to the agent changes over time as use cases expand, creating out-of-distribution inputs the model was not evaluated against. SR 11-7 monitoring programs must detect all three types — requiring behavioral baseline tracking, not just availability monitoring.
SR 11-7 Governance and Model Inventory for LLM Agents
SR 11-7 Section IV requires governance and controls covering model inventory, tiering, change management, and use limitations. Model inventory: every LLM agent used in consequential decision-making must be registered with a model owner, risk tier, validation status, and approved use scope. Risk tiering: high-consequence LLM agents (credit decisioning, fraud scoring, risk assessment) require more rigorous MRM controls than low-consequence support tools. Change management: material changes to LLM agents — new model versions, material prompt changes, new tools or data sources, expanded decision scope — must follow the change management workflow including validation re-assessment before deployment. Use limitation: LLM agents must operate within their documented decision scope; use cases outside the validated scope are model misuse. The use register captures what the model is actually being used for versus what it was validated for.