From Statistics to Deep Learning: The Evolution of Analytical Approaches in Insurance
Executive Summary
The analytical toolkit available to insurance actuaries has evolved dramatically over the past decade. This evolution follows a clear trajectory:
-
Traditional Statistical Inference (1980s-2000s): Actuaries relied primarily on Generalized Linear Models (GLMs) to build interpretable, statistically rigorous pricing models with a focus on inference—understanding why certain risk factors matter.
-
Machine Learning Adoption (2010s): The shift to algorithms like gradient boosting and random forests prioritized prediction accuracy over inference, capturing complex non-linear relationships in data while sacrificing some interpretability. This represents the current state-of-the-art at many leading insurers today.
-
Deep Learning Exploration (Late 2010s-Present): While still largely in the research and pilot phase for most insurers, neural networks promise to eliminate the need for manual feature engineering by enabling automatic representation learning from diverse data sources.
-
Transformer Architecture Research (2020s): Representing the cutting edge of academic research, transformer models show promising results with sequential insurance data in experimental settings, though widespread industry adoption remains years away.
Each progression brings greater predictive power while introducing new challenges for governance and interpretability.
Why this matters: carriers that master the blend routinely several points combined-ratio improvement while clearing regulatory hurdles. The most sophisticated actuarial teams now strategically blend these approaches—using GLMs for regulatory compliance and gradient boosting for competitive advantage, while exploring deep learning techniques for future applications.
In today's rapidly evolving technological landscape, artificial intelligence (AI) has transcended buzzword status to become a transformative force across industries. The insurance sector, traditionally conservative in adopting new technologies, is now experiencing a significant transformation through AI applications. But what exactly is AI in this context, and how does it differ from traditional statistical approaches that actuaries have used for decades? This blog post explores the evolution from traditional statistics to deep learning and how each approach contributes to insurance analytics and risk assessment. In our experience, often actuarial teams are missed in wider AI transformation efforts; with the framework we present below actuaries can narrow in on exactly where AI - we which view as the application of deep learning based techniques - can add value.
Understanding the Foundation: Traditional Statistics in Insurance
Statistical modeling has been the backbone of actuarial science for centuries. Traditional statistical approaches focus on testing hypotheses, estimating parameters for predetermined models, and establishing confidence intervals. These methods typically require explicit model specification by human experts who bring domain knowledge to the table.
Actuarial science has long relied on statistical methods to quantify and manage risk. The pivotal moment was the introduction of Generalized Linear Models by Nelder & Wedderburn (1972), later adopted by insurers in the mid-1980s, which provided a flexible exponential-family framework for frequency-severity modelling.
For example, in pricing non-life insurance policies, actuaries have traditionally used generalized linear models (GLMs) where they specify which variables (like age, gender, or vehicle type) influence claim frequency or severity, and how these variables interact.
As Richman (2020) notes in his paper on AI in actuarial science: "The traditional approach favors building algorithms to predict responses with explicitly specified stochastic data generating models," which places emphasis on model interpretability and statistical inference rather than pure predictive power.
The Statistical Approach in Practice
In traditional statistical modeling, the process typically follows these steps:
- Model Specification: The actuary explicitly defines the functional form of the relationship between variables (e.g., linear, log-linear, exponential)
- Parameter Estimation: Using methods like maximum likelihood estimation to find the optimal parameters
- Hypothesis Testing: Determining which variables are statistically significant
- Model Validation: Using diagnostic tests to check assumptions and model fit
- Interpretation: Explaining the meaning of the coefficients in business terms
This approach has several key strengths:
- Clear (quasi-causal) interpretation of relationships
- Transparent methodology that can be audited
- Strong theoretical foundation
- Manageable complexity
- Established governance frameworks
For decades, this approach served the insurance industry well, providing a balance of accuracy and interpretability that satisfied both business needs and regulatory requirements.
Interestingly, many actuaries will provide explaantions derived from GLM coefficients in a semi-causal manner, even though the necessary conditions for causal interpretation have not been met.
The Shift to Machine Learning: From Explanation to Prediction
The emergence of machine learning represents a paradigm shift in how we approach data analysis. Rather than starting with an explicit model and hand-crafted relationships between covariates and outcomes, machine learning algorithms learn patterns directly from data. The focus shifts from parameter estimation to prediction accuracy, and from model specification to algorithm selection.
Breiman (2001) characterized this shift as moving from the "data modeling culture" to the "algorithmic modeling culture." This transition reflects a fundamental change in objectives and methods.
Bridging Inference and Prediction — Why the Goal Changes Everything
Statisticians have long warned that "the model you build depends on the question you ask." Galit Shmueli crystallised this idea in her seminal paper To Explain or to Predict? (2010). She shows that explanatory modelling (for causal insight) and predictive modelling (for accuracy on unseen data) differ on four axes: causation vs association, theory vs data, retrospective vs prospective, and bias–variance priorities. For actuaries, this means that classical GLMs—perfectly tuned for inference and regulatory defensibility—may underperform when the business question is price optimisation or lapse prediction on tomorrow's portfolio.
Our view is that the art in modern actuarial work is selecting (or blending) the two cultures to suit both board-room scrutiny and real-time decision pipelines—a theme that recurs as we move to transformers below.
Machine Learning's Key Differentiators
In the machine-learning paradigm, techniques like gradient boosted trees, random forests, and support vector machines can capture complex non-linear relationships without requiring the modeler to specify these relationships in advance.
Key characteristics of the machine-learning approach include:
- Algorithm-Driven: The algorithm adapts to the data rather than forcing data into a predefined model
- Prediction-Focused: Optimizing for predictive accuracy rather than parameter interpretation
- Bias-Variance Tradeoff: Explicitly embracing the tradeoff between bias and variance to minimize prediction error
- Validation-Centric: Emphasizing out-of-sample performance through cross-validation techniques
- Feature Engineering: Still requiring significant manual feature engineering based on domain expertise
- Fairness Governance: Tool-kits such as SHAP, monotone GBMs and fair-boosting can help surface disparate-impact risks
Mullainathan & Spiess (2017) summarize this approach as: "minimizing the in-sample error of the model, measured using a loss function applied over observations, by searching over functions in a set of functions, subject to complexity restrictions."
Insurance Applications of Machine Learning
For insurance applications, machine learning enables analysts to potentially discover previously unknown patterns in:
- Policyholder Behavior: Identifying complex patterns in customer retention and cross-selling opportunities
- Claims Frequency and Severity: Discovering non-linear relationships between risk factors
- Fraud Indicators: Detecting subtle patterns of fraudulent activity
- Risk Classification: Creating more granular risk segments without explicitly defining them
Machine learning models often outperform traditional statistical approaches in predictive accuracy. For example, gradient boosted trees might capture interaction effects between variables that would be difficult to specify in a GLM.
However, this improved accuracy often comes at the cost of reduced interpretability—what Breiman (2001) called the "Occam dilemma." Fairness and explainability tool-chains (e.g. SHAP heat-maps) therefore become first-class citizens in production MLOps.
This tradeoff has particularly significant implications in the heavily regulated insurance industry, where model interpretability is often a regulatory requirement.
Deep Learning: The Next Frontier in Insurance Analytics
Deep learning represents the cutting edge of AI development, though its practical application in insurance remains largely aspirational for most carriers. As Goodfellow et al. (2016) define it, deep learning is "a representation learning technique that attempts to solve the problem by constructing hierarchies of complex features that are composed of simpler representations learned at a shallow level of the model."
In practice, deep learning uses neural networks with multiple layers of regression functions to automatically discover the representations needed for detection or classification. While this technology has transformed industries like computer vision and natural language processing, its adoption in insurance has been more measured, with most applications still in the research or pilot phase. Nevertheless, it represents an exciting frontier that promises to further evolve our analytical capabilities.
From Feature Engineering to Representation Learning
The key promise of deep learning is the ability to automatically learn useful features from raw data—a capability known as representation learning.
Traditional machine learning, including the gradient boosting methods currently favored by leading insurers, still requires manual feature engineering, where domain experts transform raw data into features that the algorithm can use. For example, an actuary might manually create age bands or calculate ratios between different data elements. This feature engineering step is often time-consuming and relies heavily on domain expertise.
Deep learning, by contrast, can potentially learn these representations directly from the data. The multiple layers in a neural network progressively extract higher-level features from lower-level features, creating a hierarchy of increasingly abstract and useful representations (Bengio et al., 2013). While this capability remains more theoretical than practical for most insurance applications today, early research shows promising results.
Self-Supervised & Contrastive Pre-Training
A few pioneering insurers are beginning to experiment with self-supervised objectives (contrastive, masked-token, auto-encoding) on pools of unlabeled telematics, call-centre logs or document scans. This pre-training approach could potentially slash label requirements and boost downstream accuracy—an avenue largely unavailable in classical ML. However, these techniques remain at the experimental stage for most carriers.
Neural Network Architectures for Future Insurance Applications
Different neural network architectures show promise for processing different types of data, suggesting potential future applications in various insurance contexts:
Convolutional Neural Networks (CNNs)
CNNs are particularly effective for processing grid-like data such as images. In insurance, they can:
- Analyze images of property damage for claims assessment
- Extract patterns from telematics heatmaps
- Process satellite imagery for property risk assessment
- Analyze dashcam footage for auto insurance
For example, a CNN could analyze photos of vehicle damage to automatically estimate repair costs or detect fraud by identifying inconsistencies between the damage and the claim description (Gao & Wüthrich, 2019).
Recurrent Neural Networks (RNNs), LSTMs, GRUs
These architectures are designed for sequential data, making them suitable for:
- Processing time series of customer interactions
- Analyzing claims handlers' notes
- Modeling mortality and longevity trends over time
- Detecting temporal patterns in claims filing behavior
For instance, an LSTM (Long Short-Term Memory) network could analyze the sequence of events leading up to a claim to identify patterns associated with fraudulent activity (Richman & Wüthrich, 2019).
Embedding Layers
Embedding layers convert high-dimensional categorical variables into meaningful dense vectors. In insurance pricing, this is powerful for handling features like:
- Vehicle makes/models (potentially hundreds of categories)
- Occupation codes (often high-cardinality features)
- Geographical zones (capturing spatial relationships)
- Industry classifications (in commercial insurance)
Embeddings can capture semantic relationships between categories that would be missed by traditional dummy coding. For example, an embedding might learn that certain vehicle models have similar risk profiles even if they're from different manufacturers (Guo & Berkhahn, 2016).
The Power of Deep Learning in Insurance
For insurance, these capabilities are transformative. Instead of manually designing features for risk assessment, neural networks can learn representations directly from diverse data sources such as:
- Structured policy data with hundreds of variables
- Unstructured text from claims descriptions and notes
- Telematics data capturing driving behavior
- Images of property or vehicle damage
- Time series of customer interactions and policy changes
These learned representations often capture subtleties in the data that would be difficult or impossible to specify manually, potentially leading to more accurate risk assessment and pricing.
Transformers and Attention Mechanisms — The Latest Evolution
When Vaswani et al. unveiled the Transformer in 2017 they dropped a simple yet profound idea: "attention is all you need." By replacing recurrent loops with multi-head self-attention, transformers learn relationships between any two elements in a sequence—no matter how far apart—while remaining embarrassingly parallel in training. Positional encodings inject an ordered sense of time or distance, and a stack of feed-forward blocks refines these attention maps into highly expressive latent features.
Actuarial Applications: The Credibility Transformer
Recent research by Richman et al. (2025) has demonstrated how transformers can be adapted specifically for insurance applications through what they call a "Credibility Transformer." This approach combines traditional credibility principles with the attention mechanism, allowing the model to balance portfolio-level information with individual policy characteristics. By introducing a special token that encodes the portfolio average experience and having it interact with policy-specific features through the attention mechanism, the model creates a natural credibility weighting that improves predictive performance. This work represents an interesting example of how domain-specific knowledge from actuarial science can inform deep learning architecture design for insurance applications.
Why should actuaries care about transformers? Because many insurance data sets are, at heart, irregular sequences: claim payments over development months, driving events over a journey, even mortality exposures over calendar years. Attention lets a model weigh each past event's relevance to the next loss, lapse, or death—something classical RNNs struggle with once the sequence grows long.
Transformers are not just for language models like ChatGPT; they have the potential to become an actuarial workhorse for any task where order, context, and heterogeneity intersect. Their ability to handle complex sequential data while maintaining some level of interpretability makes them particularly valuable in regulated insurance environments.
Conclusion: The Analytical Evolution Continues
The journey from traditional statistics to machine learning and toward deep learning represents a progressive evolution in our ability to extract insights from data. Each approach builds upon the foundations laid by its predecessors, expanding our analytical capabilities while introducing new challenges and considerations.
Traditional statistical methods remain the cornerstone of actuarial science with their strong theoretical foundations and interpretability. Machine learning, particularly gradient boosting, has gained significant adoption among leading insurers, expanding their ability to capture complex patterns without requiring explicit model specification. Deep learning remains largely aspirational for most insurance applications, with potential to eventually automate the feature engineering process and enable the analysis of diverse data types.
As we've explored, these approaches differ not only in their technical implementation but also in their fundamental goals—explanation versus prediction. Understanding these differences is crucial for effectively applying these techniques in insurance contexts where both accurate predictions and clear explanations are often required.
By appreciating the strengths and limitations of each approach, insurance professionals can better navigate the evolving analytical landscape, selecting the right tools for specific applications while maintaining the rigor and responsibility that the industry demands. For most carriers today, this means a blend of GLMs for regulatory transparency and gradient boosting for enhanced prediction, with ongoing monitoring of deep learning research for future applications.
References
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798-1828.
Breiman, L. (2001). Statistical modeling: the two cultures. Statistical Science, 16(3), 199-231. https://doi.org/10.1214/ss/1009213726
Gao, G., & Wüthrich, M. V. (2019). Feature extraction from telematics car driving heatmaps. European Actuarial Journal, 9(1), 49-65.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
Guo, C., & Berkhahn, F. (2016). Entity embeddings of categorical variables. arXiv preprint arXiv:1604.06737.
Mullainathan, S., & Spiess, J. (2017). Machine learning: an applied econometric approach. Journal of Economic Perspectives, 31(2), 87-106.
Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized Linear Models. Journal of the Royal Statistical Society A, 135(3), 370-384.
Richman, R. (2020). AI in actuarial science – a review of recent advances – part 1. Annals of Actuarial Science, 1-23.
Richman, R., & Wüthrich, M. V. (2019). A neural network extension of the Lee-Carter model to multiple populations. Annals of Actuarial Science, 13(2), 268-281.
Richman, R., Scognamiglio, S., & Wüthrich, M. V. (2025). The credibility transformer. European Actuarial Journal. https://doi.org/10.1007/s13385-025-00413-y
Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289-310. https://doi.org/10.1214/10-STS330
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.