What is the NAIC Model Bulletin on AI and which insurers does it apply to?

The NAIC Model Bulletin on the Use of Artificial Intelligence Systems by Insurers (2023) establishes five principles for insurer AI use: Accountability, Compliance, Transparency, Risk Management, and Third-Party AI Governance. It applies to any insurer using AI in underwriting, pricing, claims, fraud detection, or customer service. As a model bulletin, adoption depends on each state — but the majority of US states have now incorporated the principles into market conduct examination frameworks, making it de facto compliance baseline for multi-state insurers.

What documentation does NAIC Principle 3 (Compliance) require for AI underwriting?

NAIC Principle 3 requires regular disparate impact testing with documented methodology and results. Disparate impact testing requires a decision-level dataset — individual records showing what the AI evaluated and what it decided, with rating factor data that can be analyzed across demographic proxies. Insurers that maintain only aggregate model metrics cannot conduct the statistical analysis regulators request. State examiners typically ask for stratified samples of adverse decisions during market conduct examinations, which requires per-decision records with factor-level detail.

What does NAIC Principle 4 (Transparency) require for adverse action explanations?

NAIC Principle 4 requires decision-level explanations for adverse outcomes — not just model-level documentation. Insurers must be able to explain why a specific applicant received an adverse underwriting outcome in terms the policyholder and regulator can understand. This requires factor-level explanation: the specific rating factors that influenced the decision, their values for this applicant, and how they affected the outcome. Adverse action notices under state unfair trade practices acts have similar requirements. Per-decision reasoning records with factor data are the mechanism that makes this explanation deterministic and auditable.

How does NAIC Principle 5 (Risk Management) apply to AI drift monitoring?

NAIC Principle 5 requires post-deployment monitoring for model drift — behavioral changes that could introduce discriminatory outcomes or inaccurate decisions. Monitoring must be documented, not just performed. Required documentation includes: behavioral baselines established at monitoring period start, threshold configuration showing alerts are calibrated, 12-month monitoring history, investigation records for triggered alerts, and model version change log with behavioral impact assessment. Infrastructure monitoring (latency, errors) confirms the system is running; behavioral monitoring from decision records confirms it is performing as intended.

What happens in an insurance department AI market conduct examination?

A market conduct examination addressing AI covers: (1) AI governance policy documentation; (2) decision record sampling — examiners request a statistical sample of AI-influenced decisions for compliance review; (3) adverse action review — verifying adverse decisions have adequate explanation documentation; (4) disparate impact analysis — reviewing testing methodology and results; (5) vendor governance — examining third-party AI contracts and oversight procedures. Insurers without per-decision audit records fail category 2 and 3 requirements and may face examination findings for failure to maintain adequate records.

How does NAIC Principle 6 (Third-Party AI Governance) affect InsurTech vendor relationships?

NAIC Principle 6 requires insurers to maintain oversight of third-party AI vendors and remain responsible for regulatory compliance even when AI is vendor-supplied. In practice: insurers must obtain from InsurTech vendors the documentation needed to satisfy Principles 2-5. This includes decision-level audit records from the vendor AI system, disparate impact testing methodology, model version change controls, and adverse decision explanation capability. A vendor that cannot provide decision-level records is transferring compliance risk to the insurer.

What is the difference between a model card and NAIC-compliant AI documentation?

A model card documents what an AI model is — its training data, intended use, performance metrics, limitations, and feature importance. NAIC-compliant documentation additionally requires evidence of what the AI did — per-decision records showing what the system evaluated and what it decided, monitoring evidence showing behavioral baselines and anomaly detection history, and oversight records showing human review activity. A model card satisfies the design documentation requirement; decision records satisfy the operational accountability requirement. Both are needed.

Can I use LangSmith or Arize to satisfy NAIC AI bulletin requirements?

No. NAIC Principles 2, 4, and 5 require per-decision accountability, explanation, and monitoring that LangSmith and Arize are not designed to provide. LangSmith captures LLM call traces for development; Arize captures population-level model monitoring for ML engineers. Neither provides: decision records with the specific rating factors used for each underwriting evaluation, factor-level explanations for adverse action notices, disparate impact test datasets from per-decision records, or audit-ready export packages for market conduct examination review.

How long must insurance AI decision records be retained for NAIC compliance?

NAIC does not specify a retention period. State insurance regulations and applicable law govern. Most state unfair claims settlement practices acts require claims records to be retained for 5 years minimum. For records supporting adverse action notices, many states require retention for 3-5 years from the date of the decision. For AI underwriting decisions, aligning retention with the 5-year state requirement is standard practice. Where state law and NAIC guidance conflict, apply the stricter requirement.

NAIC AI Model Bulletin: What Insurance Underwriting AI Must Document

The NAIC Model Bulletin on the Use of Artificial Intelligence Systems by Insurers establishes five principles for insurer AI use. Principles 2 through 6 require accountability documentation, disparate impact testing data, decision-level adverse action explanations, ongoing behavioral monitoring evidence, and oversight controls for third-party AI vendors. Model documentation and aggregate performance metrics do not satisfy these requirements. Per-decision audit records do.

NAIC AI Model Bulletin Scope

The NAIC Model Bulletin on AI establishes five principles applicable to insurer use of AI in underwriting, pricing, claims, and customer service. As of 2026, the majority of US states have incorporated the bulletin's principles into market conduct examination frameworks. Principle 2 (Accountability) requires named roles with documented oversight activity. Principle 3 (Compliance) requires regular disparate impact testing with documented methodology and results. Principle 4 (Transparency) requires decision-level explanations for adverse outcomes. Principle 5 (Risk Management) requires baseline measurement, ongoing monitoring, and drift detection documentation. Principle 6 (Third-Party AI Governance) requires that insurers maintain oversight and records even for vendor-supplied AI systems.

Principle 3: Disparate Impact Testing Data

Disparate impact testing for insurance AI requires a decision-level dataset. The analysis compares approval rates, premium levels, and coverage terms across demographic groups using rating factor data as proxies where direct demographic data is unavailable. Without per-decision records capturing the inputs used for each underwriting evaluation, insurers cannot conduct the statistical analysis state examiners will request during market conduct examinations. Insurers that maintain only aggregate model metrics cannot respond to examination requests for stratified decision samples.

Principle 4: Decision-Level Explanation

NAIC Principle 4 distinguishes model-level transparency (how the model generally works) from decision-level transparency (why this specific applicant received this outcome). Adverse action notices under state unfair trade practices acts require factor-level explanation: the specific rating factors that contributed to the adverse decision, their values for this applicant, and how they affected the outcome. Generating factor-level explanations from decision records is deterministic and auditable. Generating them post-hoc from a black-box model is unreliable and produces explanations that may not match the actual decision basis.

Principle 5: Behavioral Drift Monitoring

NAIC Principle 5 requires post-deployment monitoring for model drift with documented evidence. For insurance AI, relevant behavioral indicators include: approval rate drift by product line and geography, coverage tier distribution shifts, declination rate patterns by ZIP code (redlining signal), override rate increases by product (indicator of systematic AI errors), and model version provenance tracking. Infrastructure metrics (latency, errors, uptime) confirm the system is running — they do not confirm it is producing compliant decisions. Behavioral monitoring from decision records is required.

Implementation for NAIC Compliance

Configure TenetClient with policy_version and system_id to attach documented control metadata to each decision record. Capture all rating factors in ctx.snapshot_context() to create the disparate impact test dataset. Include factor-level explanation in ctx.decide() for adverse action notice generation. Record underwriter reviews with tenet.record_override() and tenet.record_confirmation() to satisfy Principle 2 accountability documentation. Configure anomaly detection with approval_rate_shift and geographic_pattern thresholds to satisfy Principle 5 continuous monitoring requirements.