OCC Model Validation for AI/ML in Banking: SR 11-7 Extension Guidance
OCC Bulletin 2011-12 (SR 11-7) applies to any quantitative method used for bank decision-making — which includes ML credit scoring models, fraud detection neural networks, and AI agents routing loan applications. The three pillars of SR 11-7 validation are conceptual soundness (theory, data, assumptions), ongoing monitoring (performance metrics, drift detection), and outcomes analysis (predicted vs. actual). Banks cannot outsource validation obligations for third-party AI models. The OCC's 2021 AI/ML FAQ supplement confirmed that ML models satisfy the "model" definition under SR 11-7 even when they lack traditional statistical interpretability.
What Counts as a Model Under SR 11-7
SR 11-7 defines a model as a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories to process input data into quantitative estimates used for decision-making. The key test is whether the output informs a business decision. An LLM answering customer FAQs is not a model. An LLM scoring credit applications or recommending loan terms is. The OCC 2021 AI/ML FAQ supplement explicitly confirmed ML models qualify — even black-box models without traditional interpretability — when their outputs drive bank decisions. This broad definition covers gradient boosting credit scorers, deep learning fraud detectors, NLP contract review systems, and agentic AI loan processors.
The Three Pillars of SR 11-7 Validation
Conceptual soundness requires documenting the theoretical basis for the model, justifying the algorithm selection, explaining training data sources and quality, and demonstrating that model assumptions hold in the deployment environment. For ML models, this includes explainability evidence for features and predictions. Ongoing monitoring requires continuous tracking of performance metrics against baseline, input data distribution monitoring (PSI/KL divergence for structured models, semantic drift for LLM-based models), and defined thresholds triggering escalation. Outcomes analysis requires comparing model predictions to actual outcomes over time — default rate predictions vs. actual defaults, fraud flags vs. confirmed fraud — with back-testing and challenger model benchmarking.
Documentation Requirements: What OCC Examiners Review
Model development documentation must exist before deployment and cover theory, data sources and quality, feature engineering rationale, algorithm selection justification, known limitations, and intended use. Validation reports must be produced by a team independent from development, covering out-of-time/out-of-sample testing, adversarial testing, fairness analysis, and a formal finding. Ongoing monitoring reports must be produced at intervals matching the model risk tier — monthly for high-risk production models, quarterly for lower-risk models. Issue tracking must show findings with severity, owner, due date, and remediation evidence. Change management documentation must capture material changes — including vendor model updates for third-party AI — and evidence of pre-deployment testing for each change.
Independent Validation: What Independence Means for AI Models
SR 11-7 requires the validation function to be independent from model development — the team that built the credit scoring model cannot validate it. For AI models, independence is structurally harder: development teams hold unique knowledge of training procedures, and external validators may lack ML expertise. The OCC FAQ guidance allows for effective challenge — validators must be able to probe assumptions, test edge cases, and form independent fitness-for-purpose opinions. For vendor AI models, independence requires contractual access to model cards, training data summaries, and performance benchmarks. Third-party validation can supplement but cannot replace internal validation. The bank remains responsible for validating all models used in its operations, including AI APIs and embedded ML scoring services.
Ongoing Monitoring: The Most Common Examination Finding
OCC examination findings consistently cite inadequate ongoing monitoring as the primary model risk management deficiency. Conceptual soundness documentation is usually present; outcome tracking is usually absent. For AI models, the monitoring challenge is compounded by model drift — the data distribution shifts, the real-world environment changes, and model performance degrades silently between revalidation cycles. SR 11-7 requires monitoring frequency to match model risk tier: high-risk models used in every loan decision should be monitored monthly; lower-risk analytics models may qualify for quarterly review. The bank's MRM policy must define risk tiers, monitoring frequency standards, escalation thresholds, and the process for triggering revalidation when thresholds are breached.