What development documentation does SR 11-7 require for LLM agents?

SR 11-7 Section II requires documentation of purpose and design rationale, data used in development, testing results, and known limitations. For LLM agents this includes: decision scope documentation (what decisions the agent makes and the authorized boundaries), foundation model selection rationale (why this provider and version, what was tested), prompt architecture treated as model design documentation (every material prompt change is a model change requiring re-documentation), and known failure mode documentation. All documentation must be current — a model provider update that changes agent behavior without updated documentation is an SR 11-7 violation.

What does SR 11-7 independent validation require for LLM agents?

Independent validation must be performed by qualified personnel not responsible for model development. For LLM agents, validation must assess: behavioral baseline testing against predefined performance thresholds using representative production data, sensitivity testing for prompt injection and adversarial inputs, reasoning consistency evaluation across similar inputs, and residual risk documentation. Validation is not a one-time gate — SR 11-7 requires re-validation after material changes, which for LLM agents includes model provider updates and material prompt changes.

What is model drift in the context of SR 11-7 and LLM agents?

SR 11-7 requires ongoing monitoring to detect when model performance deviates from validation-time benchmarks. For LLM agents, three drift types require monitoring: foundation model provider drift (provider updates the base model with behavioral changes without notice), reasoning path drift (the agent's reasoning patterns shift due to changes in context or inputs over time), and input population shift (the distribution of inputs changes as use cases expand, creating out-of-distribution inputs). Standard infrastructure monitoring does not detect any of these — behavioral baseline tracking against decision-level records is required.

What is the SR 11-7 tiering requirement for LLM agents?

SR 11-7 requires model inventory tiering — assigning risk levels that determine the rigor of MRM controls. High-consequence LLM agents (credit decisioning, fraud scoring, risk assessment, trading) require the most rigorous controls: formal validation, governance board review, change management workflow, and ongoing monitoring programs. Lower-consequence agents (internal search, document summarization) may require lighter controls but must still be inventoried and assessed. Tiering decisions must be documented and justified.

What change management does SR 11-7 require for LLM agent updates?

SR 11-7 requires that material model changes go through review before production deployment. For LLM agents, material changes include: new model provider versions (even patch releases if they change model behavior), material prompt changes, new tools or data sources, and expanded decision scope beyond the validated use case. The change management workflow must assess impact on the validated performance baseline, determine if re-validation is required, and document the decision. Using the current model version in production after a provider update without assessing behavioral change is an SR 11-7 change management failure.

How does SR 11-7 compare to EU AI Act for LLM agents at banks?

SR 11-7 and EU AI Act overlap significantly for banking AI systems. SR 11-7 focuses on model risk within a US regulatory supervisory framework — examination by Fed or OCC examiners, enterprise MRM governance, and validation standards. EU AI Act focuses on AI system risk for high-risk categories defined in Annex III, with market prohibition and financial penalties for non-compliance. Both require development documentation, independent validation, ongoing monitoring, and governance controls. Key differences: EU AI Act has specific technical requirements (Art.9 representative data testing, Art.12 logging, Art.14 human oversight); SR 11-7 is more principles-based and flexible in implementation. Banks operating in EU face both simultaneously.

What audit trail does SR 11-7 require for LLM agent decisions?

SR 11-7 requires records sufficient to demonstrate model performance, validation status, and governance compliance. For LLM agents, this includes: decision-level records with inputs, outputs, and reasoning at the time of each decision (for ongoing monitoring and outcomes analysis), model version active at each decision (for change impact assessment), human override records (for outcomes analysis and improvement), and validation records against predefined benchmarks. SR 11-7 ongoing monitoring requires comparing current behavior against documented baselines — which requires tamper-evident decision-level records rather than aggregate metrics.

SR 11-7 Model Risk Management for LLM Agents: What Fed and OCC Guidance Requires

Q: Does SR 11-7 apply to LLM agents?

SR 11-7 defines a model as any quantitative method or system that produces outputs used to inform decisions when those decisions have financial, credit, or risk consequences. An LLM agent that produces credit recommendations, risk assessments, or trading signals fits this definition. Fed and OCC examiners have increasingly cited LLM-based decision tools in model risk discussions. Banks that treat LLM agents as software rather than models and exclude them from MRM governance create examination risk.

SR 11-7 (Supervisory Guidance on Model Risk Management), issued by the Federal Reserve and OCC in 2011, defines "model" broadly enough to include LLM-based agents used in financial institution decision-making. Any quantitative method — including an LLM — that produces outputs used to inform decisions falls within scope when those outputs have consequential impact on credit, risk, or financial outcomes. This guide maps SR 11-7 development documentation, independent validation, governance and controls, and ongoing monitoring requirements to LLM agent deployments at banks, credit unions, and financial holding companies.

Does SR 11-7 Apply to LLM Agents?

SR 11-7 defines a model as "a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates." The guidance explicitly notes that simpler statistical models are covered, but so are "systems in which the inputs are transformed by algorithms." An LLM agent that processes borrower information and produces credit recommendations — or processes market data and produces trading signals — fits this definition. Fed and OCC examiners have increasingly cited LLM-based decision tools in model risk discussions. Banks are expected to have a model inventory that includes LLM agents used in consequential decision-making. Treating LLM agents as software (not models) leaves them outside MRM governance and creates examination risk.

SR 11-7 Development Documentation for LLM Agents

SR 11-7 Section II requires comprehensive development documentation covering purpose and design rationale, data used in development, testing and validation results, and known limitations. For LLM agents, this means documenting the purpose and decision scope (what decisions the agent is authorized to make and the boundaries of that authority), foundation model selection rationale (why this provider and model version, what alternatives were evaluated, what testing determined fitness), prompt architecture and system prompt (treated as model design documentation — every material prompt change is a model change requiring re-documentation), training and fine-tuning data if applicable, and known failure modes and limitations. Documentation must exist at deployment and be updated on material changes — a model provider update that changes agent behavior is a material change requiring updated documentation.

Independent Validation Requirements for LLM Agents

SR 11-7 Section III requires independent validation performed by qualified personnel who are not responsible for model development. Validation must assess conceptual soundness (does the model do what it claims to do), data integrity and fitness (is the data appropriate for the intended use), and outcomes analysis (does the model perform as expected against measurable outcomes). For LLM agents, independent validation must test behavioral baselines against predefined performance thresholds using representative production-like data, evaluate sensitivity to input variations (prompt injection, edge cases, adversarial inputs), assess reasoning consistency across similar inputs, and document residual risks with compensating controls. Validation is ongoing — not a one-time gate. SR 11-7 requires re-validation when material changes occur, which for LLM agents includes model provider updates, material prompt changes, and deployment context expansions.

LLM Agent Behavioral Drift and SR 11-7

SR 11-7 ongoing monitoring requirements require tracking model performance against established metrics over time. LLM agents face three drift patterns not present in traditional statistical models. Foundation model provider drift: model providers periodically update base models with behavioral changes — a model version pinned to "gpt-4o" today may produce different outputs after a provider update, without the consuming organization taking any action. Reasoning path drift: even with a fixed model version, agent reasoning patterns can shift due to changes in context, tool outputs, or cumulative interaction patterns. Input population shift: the distribution of inputs to the agent changes over time as use cases expand, creating out-of-distribution inputs the model was not evaluated against. SR 11-7 monitoring programs must detect all three types — requiring behavioral baseline tracking, not just availability monitoring.

SR 11-7 Governance and Model Inventory for LLM Agents

SR 11-7 Section IV requires governance and controls covering model inventory, tiering, change management, and use limitations. Model inventory: every LLM agent used in consequential decision-making must be registered with a model owner, risk tier, validation status, and approved use scope. Risk tiering: high-consequence LLM agents (credit decisioning, fraud scoring, risk assessment) require more rigorous MRM controls than low-consequence support tools. Change management: material changes to LLM agents — new model versions, material prompt changes, new tools or data sources, expanded decision scope — must follow the change management workflow including validation re-assessment before deployment. Use limitation: LLM agents must operate within their documented decision scope; use cases outside the validated scope are model misuse. The use register captures what the model is actually being used for versus what it was validated for.