ICEnet: How Actuarial Thinking Teaches Us to Align Deep Learning with Human Expertise
Executive Summary
- ICEnet aligns flexible neural networks with actuarial expertise by adding differentiable penalties on Individual Conditional Expectation (ICE) outputs, enforcing smoothness and monotonicity in the output space.
- This output-space approach preserves standard architectures and parallels modern alignment methods (RLHF/RLVR), shaping behaviors without constraining internal weights.
- In MTPL experiments, ICEnet matches unconstrained accuracy while improving stability and governance; e.g., Poisson deviance ≈ 0.2385–0.2388, with simple ensembling ("nagging") yielding modest gains.
- Known considerations: correlated features, careful penalty tuning, and added compute—manageable with GPUs.
- Validation focuses on deviance, monotonicity violation rates, smoothness scores, calibration at extremes, and ablations (mono-only, smooth-only, both).
- Practical impact: narrows the Rashomon set toward regulator-ready, auditable models with coherent, consistent explanations across retrainings.
The Two Cultures Meet in the Middle
In 2001, Leo Breiman famously described two competing cultures in statistical modeling: traditional statistics, which focuses on understanding why through interpretable models, and machine learning, which relentlessly optimizes for what, pursuing predictive accuracy above interpretability (Breiman, 2001). Statisticians often worry that black-box models sacrifice scientific understanding, while machine learning practitioners argue that overly interpretable models leave predictive power untapped.
We have similar debates on how best to bridge this gap within the actuarial profession. Some advocate for enhancements to the traditional model class of Generalized Linear Models (GLMs), widely used across actuarial disciplines (in various different shapes and forms). Others are embracing black-box models, relying rather on post-hoc interpretability measures to ensure models are well understood.
Neural networks - and other highly flexible machine learning models - compound this challenge through what Breiman termed the "Rashomon effect" - which is when multiple equally accurate models that tell completely different stories about the same data. Train five neural networks on identical insurance portfolios, and you'll likely receive five distinct explanations for risk patterns. One model might emphasize vehicle characteristics, another driver demographics, yet another geographic factors - all achieving similar predictive accuracy but through fundamentally different internal representations.
This multiplicity isn't only an academic curiosity, but poses serious governance challenges:
- Which model's logic should inform business strategy?
- How can actuaries defend pricing decisions to regulators when models disagree?
- What happens when equally valid models suggest contradictory interventions?
ICEnet, developed by Ron Richman and Professor Mario Wüthrich (insureAI's CEO and Senior Scientific Advisor, respectively), bridges this divide without compromising either perspective. Moreover, it helps to mitigate the Rashomon effect by selecting for models with a specific logic and decision-making basis. It works by training a neural network to simultaneously generate predictions and evaluate Individual Conditional Expectation (ICE) plots, which are visual representations showing how predictions change when varying inputs. The innovation lies in penalizing undesirable behaviors, such as roughness or monotonicity violations, directly in the network's output space (represented in the ICE plots) rather than constraining internal weights and biases (Richman & Wüthrich, 2024).
The Alignment Challenge in Professional AI
The alignment challenge - ensuring AI outputs align with domain expertise, regulations, and ethical constraints-extends far beyond insurance. Similar alignment problems appear in medicine, finance, engineering, and law. In language models like ChatGPT, alignment involves multiple stages:
- Pre-training: Large-scale unsupervised training on massive amounts of text data to teach the model general language patterns and representations.
- Instruction tuning: Fine-tuning models on datasets of structured instructions and responses to better align outputs with specific tasks.
- Reinforcement Learning from Human Feedback (RLHF): Incorporating human feedback directly into the training process. Human annotators rate the quality of model-generated responses, guiding models toward outputs that align more closely with human values and preferences (Ouyang et al., 2022).
- Constitutional AI: Implementing explicit rules or "constitutions" that define acceptable and unacceptable model behaviors, guiding the model to adhere strictly to ethical and practical constraints (Bai et al., 2022).
ICEnet follows a similar philosophy but with a critical advantage: professional domains offer precise, quantifiable constraints rather than fuzzy human preferences. For example, insurance pricing models often must increase monotonically with risk factors like bonus-malus ratings or policyholder risk levels (Richman & Wüthrich, 2024). A medical dosing algorithm similarly must adhere strictly to physiological boundaries.
Output Space: The Unexpected Key to Alignment
ICEnet's innovation is applying constraints directly to the outputs - ICE vectors generated from pseudo-data-rather than internal network parameters. This approach mirrors modern LLM alignment practices, where output behaviors are shaped through auxiliary objectives rather than parameter-level restrictions.
How ICEnet constrains predictions:
- Select features to constrain and define a discrete evaluation grid.
- For each policy, generate pseudo-data varying only the constrained feature.
- Calculate ICE vectors and penalize deviations:
- Smoothness: Penalize the squared third differences along ICE vectors.
- Monotonicity: Apply a hinge penalty for first differences violating the desired direction (δⱼ = +1 for increasing, −1 for decreasing).
- Optimize a single set of network weights to minimize prediction loss plus ICE-based penalties.
Training complexity scales linearly with the number of features and evaluation points but remains manageable due to GPU parallelism. On French MTPL data, an unconstrained network trained in about 3 minutes, while ICEnet trained in approximately 12 minutes (Richman & Wüthrich, 2024).
Limitations and Gotchas
- Correlated features: PDPs and ICE curves can mislead when inputs are correlated.
- Penalty tuning: Inappropriate penalty strengths or incorrect constraints (e.g., enforcing monotonicity on a non-monotone relationship) can significantly degrade model performance (Richman & Wüthrich, 2024).
- Compute budget: Global constraints increase computational load, mitigated by GPUs.
Beyond Accuracy: The Multi-Objective Future
Our MTPL experiments reveal monotonicity constraints regularly improve out-of-sample deviance, while smoothness constraints may slightly trade accuracy for stability. With both constraints, ICEnet performed comparably to unconstrained models but more stably:
- French MTPL Example (Poisson deviance): FCN test: 0.2387, ICEnet test: 0.2388, ICEnet (monotonicity-only) test: 0.2385. Ensemble methods ("nagging") improved FCN to 0.2383 and ICEnet (monotonicity-only) to 0.2382 (Richman & Wüthrich, 2024).
This evidence challenges the belief that interpretability inherently sacrifices accuracy. Instead, thoughtful incorporation of domain knowledge can improve both trustworthiness and predictive performance.
The Bridge We've Been Waiting For
ICEnet represents a philosophical bridge:
- Statistical culture contributes constraints encoding actuarial wisdom.
- Algorithmic culture provides flexible neural networks capturing complexity.
- Output space becomes the mathematical meeting ground allowing both cultures to coexist.
Unlike lattice networks or monotone-constrained models, ICEnet preserves standard neural network architectures, aligning outputs via targeted loss penalties rather than structural modifications.
Lessons for the Broader AI Landscape
ICEnet highlights principles essential for aligning AI:
- Constraints as Features, Not Bugs: Expert constraints often enhance model performance and stability.
- Output Space as Control Surface: Directly constraining outputs rather than internals yields intuitive and effective alignment. This can be done during training or in a post-training setup.
- Domain Knowledge as Regularization: Established domain knowledge remains vital in the AI era.
- Precise Alignment in Specialized Domains: Professional domains are ideal environments for testing alignment techniques applicable to broader AI systems.
- Bridging Paradigms Through Design: Innovative design can resolve philosophical debates in modeling.
How to Validate an ICEnet
- Evaluate in-sample and out-of-sample deviance.
- Quantify monotonicity and smoothness across numerous observations, ensuring distributions shift positively (Richman & Wüthrich, 2024).
- Verify calibration by checking extreme predictions.
- Conduct ablation studies (mono-only, smooth-only, both) to validate constraints' impacts.
Back to Rashomon
The Rashomon effect describes the large set of near-optimal predictors that fit the same data yet embody different explanations. In actuarial work, we do not want any accurate story; we want the accurate stories that also respect established business invariants and prudential logic (Breiman, 2001).
ICEnet mitigates Rashomon by shaping the hypothesis class in the output space and by using the alignment term as a principled tie-breaker among equally accurate networks (Richman & Wüthrich, 2024):
- Hypothesis-class shaping: ICE-based smoothness and monotonicity penalties shrink the admissible set to functions whose local responses along constrained features are actuarially plausible. Models that match deviance but violate shape constraints are dominated.
- Explanation stability: Because constraints act directly on outputs, retrainings across seeds or folds converge to similar ICE shapes, yielding consistent narratives and decisions.
For pricing and reserving, this means:
- Monotone premiums with respect to risk scores such as bonus-malus and other risk proxies, avoiding perverse local decreases.
- Smoother partial effects that reduce spurious, non-business-causal interactions.
- More coherent business rules: similar policies move in similar directions, enabling auditable, regulator-ready justifications.
- Comparable accuracy to unconstrained models while narrowing explanation variance, as shown in our experiments.
In short, ICEnet does not pretend the Rashomon set disappears; it reweights it in favor of aligned actuarial models, privileging those that tell the right kind of story while maintaining predictive performance.
Conclusion: Reconciliation Through Innovation
ICEnet demonstrates that the dichotomy between human expertise and machine intelligence, which generates so much discussion, may not, in fact, be as important as we once believed. Constraining neural networks in the output space creates trustworthy, auditable, and accurate models that regulators, actuaries, and customers can confidently use. The future of professional AI belongs to systems embracing both human wisdom and machine capability — output space might just be the ideal place for them to converge.
References
- Bai, Y., et al. (2022). Constitutional AI: Harmlessness from AI feedback. Anthropic. arXiv:2212.08073
- Breiman, L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical Science, 16(3), 199–231. doi:10.1214/ss/1009213726
- Leike, J., Irving, G., Christiano, P., et al. (2023). Process supervision for language models. OpenAI Technical Report. openai.com/research/process-supervision
- Ouyang, L., et al. (2022). Training language models to follow instructions with human feedback. NeurIPS. arXiv:2203.02155
- Rafailov, R., et al. (2023). Direct Preference Optimization: Your Language Model is Secretly a Reward Model. arXiv:2305.18290
- Richman, R., & Wüthrich, M. V. (2024). Smoothness and monotonicity constraints for neural networks using ICEnet. Annals of Actuarial Science, 18(3), 712–739. doi:10.1017/S174849952400006X
- Apley, D. W., & Zhu, J. (2016). Visualizing the effects of predictor variables in black box supervised learning models (ALE). arXiv:1612.08468
- Richman, R., Scognamiglio, H., & Wüthrich, M. V. (2025). Lecture slides.
Technical Appendix - Bridging from ICEnet to RLHF and modern alignment methods
RLHF has become the canonical approach for aligning large language models (LLMs) with human expectations. In RLHF, a pretrained model first produces candidate outputs, and human labelers rank or compare these responses. A reward model is then trained to predict the human preference ordering, and the base model is fine‑tuned against this learned reward using a KL‑regularized reinforcement learning algorithm, typically Proximal Policy Optimization (PPO). This process, introduced in InstructGPT by Ouyang et al. (2022), substantially improves helpfulness and reduces undesirable behavior by shaping outputs through an external reward, rather than constraining internal parameters. Later refinements such as Constitutional AI (Bai et al., 2022) replaced many human labels with “AI feedback,” using a written constitution of normative rules to automate judgments. Even more direct variants such as Direct Preference Optimization (Rafailov et al., 2023) and Implicit Preference Optimization remove the reinforcement loop entirely by solving the same KL objective as a single supervised loss. Each of these methods operates at the output‑space level - rewarding or penalizing model behaviors rather than weights - which parallels ICEnet’s idea of constraining neural networks through differentiable penalties on their predicted outputs rather than their parameters
Reinforcement learning with verifiable rewards (RLVR) extends this philosophy by replacing subjective human labels with objective, checkable signals. Instead of “did this answer seem polite?”, RLVR uses programmatic verifiers: mathematical proofs, unit tests, or formal rule checks that deterministically grade correctness. Closely related is process supervision, in which intermediate reasoning steps are evaluated instead of (or in addition to) final answers - see Leike et al. (2023) and OpenAI’s Let’s Verify Step by Step experiments. These ideas address RLHF’s key weakness: preference noise. In insurance and other professional domains, many outputs are inherently verifiable - premium rates must increase monotonically with exposure or risk, medical dosages must respect physiological limits - making ICEnet a domain‑specific type of RLVR. Its compound loss function explicitly adds verifiers for “smoothness” and “monotonicity” of the Individual Conditional Expectation (ICE) outputs, turning actuarial regulations and expert constraints into differentiable, verifiable rewards
The symmetry becomes clearer when we interpret ICEnet’s penalty vector λ as the analogue of the RLHF KL weight - the hyper‑parameter that trades off adherence to human feedback against retention of pretraining knowledge. Too small a λ yields unconstrained, possibly irrational outputs; too large over‑constrains and harms accuracy - precisely the tension faced when tuning KL coefficients or reward scales in LLM alignment. Empirically, monotonicity penalties in ICEnet act like verifiable rewards: they improve generalization and reduce variance across runs, much as RLVR improves consistency in reasoning benchmarks
Taken together, ICEnet and RLHF/RLVR share a unifying principle - alignment through output‑space optimization - where domain knowledge, human feedback, or formal rules act as external reward functions guiding flexible but auditable systems.