Backtesting Mortality Models: Ex-Post Evaluation in Longevity Risk Solutions

Backtesting Stochastic Mortality Models: An Ex-Post Evaluation of Multi-Period-Ahead Density ForecastsKevin Dowd (CRIS, NUBS) Andrew J. G. Cairns (Heriot-Watt)David Blake (Pensions Institute, Cass Business School)Guy D. Coughlan (JPMorgan)David Epstein (JPMorgan)Marwa Khalaf-Allah (JPMorgan)4th International Longevity Risk and Capital Market Solutions ConferenceAmsterdam September 2008

Purposes of Paper • To set out a framework to backtest the forecast performance of mortality models • Backtesting = evaluation of forecasts against subsequently realised outcomes • To apply this backtesting framework to a set of mortality models • How well do they actually perform?

Background • This study is the fourth in a series involving a collaboration between Blake, Cairns and Dowd and the LifeMetrics team at JPMorgan • Involves actuaries, economists and investment bankers • Of course, it is very easy (and fun!) to attack the forecasting ‘abilities’ of actuaries (remember Equitable?) and investment bankers (remember subprime? etc), but we should remember…

Its not just actuaries and investment bankers who can’t forecast

Background • Cairns et alia (2007) examines the empirical fits of 8 different mortality models applied to E&W and US male mortality data • Compares model performance • Uses a range of qualitative criteria (e.g., biological reasonableness, etc) • Uses a range of quantitative criteria (e.g., Bayes information criterion)

Models considered • Model M1 = Lee-Carter, no cohort effect • Model M2 = Renshaw-Haberman’s 2006 cohort effect generalisation of M1 • Model M3 = Currie’s age-period-cohort model • Model M4 = P-splines model, Currie 2004 • Model M5 = CBD two-factor model, Cairns et al (2006), no cohort effect • Models M6, M7 and M8: alternative cohort-effect generalisations of CBD

Second study, Cairns et al (2008) • Examines ex ante plausibility of models’ density forecasts • M4 (P-Splines not considered) • Amongst other conclusions, finds that M8 (which did very well in first study) gives very implausible forecasts for US data • Hence, decided to drop M8 as well • Thus, a model might fit past data well but still give unreliable forecasts •  Not enough just to look at past fits

Third study, Dowd et al (2008a) • Examines the Goodness of Fits of models M1, M2B, M3B, M5, M6 and M7 more systematically • M2B is a special case of M2, which uses an ARIMA(1,1,0) for cohort effect • M3B is a special case of M3, which the same ARIMA(1,1,0) for cohort effect • Basic idea to unravel the models’ testable implications and test them systematically • Finds some problems with all models but M2B unstable

Motivation for present study • A model might • Give a good fit to past data and • Generate density forecasts that appear plausible ex ante • And still produce poor forecasts • Hence, it is essential to test performance of models against subsequently realised outcomes • This is what backtesting is about • In the end, it is the forecast performance that really matters • Would you want to drive a car that hadn’t been field-tested?

Backtesting framework • Choose metric of interest • Could choose mortality rates, survival rates, life expectancy, annuity prices etc. • Select historical lookback window used to estimate model params • Select forecast horizon or lookforward window for forecasts • Implement tests of how well forecasts subsequently performed

Backtesting framework • We choose focus mainly on mortality rate as metric • We choose a fixed 10-year lookback window • This seems to be emerging as the standard amongst practitioners • We examine a range of backtests: • Over contracting horizons • Over expanding horizons • Over rolling fixed-length horizons • Future mortality density tests

Backtesting framework • We consider forecasts both with and without parameter uncertainty • Parameter certain case: treat estimates of parameters as if known values • Parameter uncertain case: forecast using a Bayesian approach that allows for uncertainty in parameter estimates • Allows for uncertainty in parameters governing period and cohort effects • Results indicate it is very important to allow for parameter uncertainty

Contracting horizon BT: age 65

Conclusions so far • Big difference between PC and PU forecasts • PU prediction intervals usually considerably wider than PC ones • M2B sometimes unstable • Now consider expanding horizon predictions …

Prediction-Intervals from 1980: age 65

Expanding PI conclusions • PC models have far too many lower exceedances • PU models have exceedances that are much closer to expectations • Especially for M1, M7 and M3B • Suggests that PU forecasts are more plausible than PC ones • Negligible differences between PC and PU median predictions • Very few upper exceedances

Expanding PI conclusions • Too few upper exceedances, and two many median and lower exceedances •  some upward bias, especially for PC forecasts • This upward bias is especially pronounced for PC forecasts • Evidence of upward bias less clearcut for PU forecasts

Rolling Fixed Horizon Forecasts • From now on, work with PU forecasts only • Assume illustrative horizon = 15 years • Now examine performance of each model in turn …

Model M1

Model M2B

Model M3B

Model M5

Model M6

Model M7

Tentative conclusions so far • Rolling PI charts broadly consistent with earlier results • Some evidence of upward bias but not consistent across models or always especially compelling • M2B again shows instability

Mortality density tests • Choose age (e.g., 65) and horizon (e.g., 15 years ahead) • Use model to project pdf (or cdf) of mortality rate 15 years ahead • Plot realised q on to pdf/cdf • Obtain associated p-value (or PIT value) • Reject if p is too far out in either tail

Example: P-Values of Realised Mortality: Males 65, 1980 Start, Horizon = 26 Years Ahead

Many ways to do this • For h=25 years ahead: 1 way • 1980-2005 only • For h=24 years ahead, 2 ways • 1980-2004, 1981-2005 • For h=23 years ahead, 3 ways • …. • For h=1 year ahead, 26 ways • 1980-1981, 1981-1982, …, 2004-2005

Lots of cases to consider • The are 25+24+23+…+1=325 separate cases to consider, each equally ‘legitimate’ • Need some way to make use of all possibilities but consolidate results • We do so by computing p-values for each case and then work with mean p-values from each test • These are reported below for each age, for h=5, 10 and 15 years ahead:

Age 65

Age 75

Age 85

Conclusions from these tests • All models perform well • No rejections at 1% SL • Only 3 at 5% SL

Overall conclusions • Study outlines a framework for backtesting forecasts of mortality models • As regards individual models and this dataset: • M1, M3B, M5 and M7 perform well most of the time and there is little between them • M2B unstable • Of the Lee-Carter family of models, hard to choose between M1 and M3B • Of the CBD family, M7 seems to perform best; little to choose between M5 and M7

Two other points stand out • In many but not all cases, and depending also on the model, there is evidence of an upward bias in forecasts • This is very pronounced for PC forecasts • This bias is less pronounced for PU forecasts • Except maybe for M2B, PU forecasts are more plausible than the PC forecasts •  Very important to take account of param uncertainty more or less regardless of the model one uses

References • Cairns et al. (2007) “A quantitative comparison of stochastic mortality models using data from England & Wales and the United States.” Pensions Institute Discussion Paper PI-0701, March • Cairns et al. (2008) “The plausibility of mortality density forecasts: An analysis of six stochastic mortality models.” Pensions Institute Discussion Paper PI-0801, April. • Dowd et al. (2008a) “Evaluating the goodness of fit of stochastic mortality models.” Pensions Institute Discussion Paper PI-0802, September. • Dowd et al. (2008b) “Backtesting stochastic mortality models: An ex-post evaluation of multi-year-ahead density forecasts.” Pensions Institute Discussion Paper PI-0803, September. • These papers are also available at www.lifemetrics.com

Backtesting Mortality Models: Ex-Post Evaluation in Longevity Risk Solutions

Backtesting Mortality Models: Ex-Post Evaluation in Longevity Risk Solutions

Presentation Transcript

Purposes Of assessment

PURPOSES OF ORIENTATION

Purposes of Module

Purposes of Conferences

PURPOSES OF NONVERBAL

Purposes of Punishment

Purposes of Pentecost

Purposes of Art

Purposes of Dating

Purposes of Elections

Purposes of Legislation

Purposes of Module

Purposes Of Apoptosis

Purposes of Photography

Purposes of Reading

Purposes of Music

Purposes of IVF’s

Purposes

Purposes

Purposes of Art

Purposes of Communication

Purposes of Conferences