1 / 40

Purposes of Paper

varana
Télécharger la présentation

Purposes of Paper

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Backtesting Stochastic Mortality Models: An Ex-Post Evaluation of Multi-Period-Ahead Density ForecastsKevin Dowd (CRIS, NUBS) Andrew J. G. Cairns (Heriot-Watt)David Blake (Pensions Institute, Cass Business School)Guy D. Coughlan (JPMorgan)David Epstein (JPMorgan)Marwa Khalaf-Allah (JPMorgan)4th International Longevity Risk and Capital Market Solutions ConferenceAmsterdam September 2008

  2. Purposes of Paper • To set out a framework to backtest the forecast performance of mortality models • Backtesting = evaluation of forecasts against subsequently realised outcomes • To apply this backtesting framework to a set of mortality models • How well do they actually perform?

  3. Background • This study is the fourth in a series involving a collaboration between Blake, Cairns and Dowd and the LifeMetrics team at JPMorgan • Involves actuaries, economists and investment bankers • Of course, it is very easy (and fun!) to attack the forecasting ‘abilities’ of actuaries (remember Equitable?) and investment bankers (remember subprime? etc), but we should remember…

  4. Its not just actuaries and investment bankers who can’t forecast

  5. Background • Cairns et alia (2007) examines the empirical fits of 8 different mortality models applied to E&W and US male mortality data • Compares model performance • Uses a range of qualitative criteria (e.g., biological reasonableness, etc) • Uses a range of quantitative criteria (e.g., Bayes information criterion)

  6. Models considered • Model M1 = Lee-Carter, no cohort effect • Model M2 = Renshaw-Haberman’s 2006 cohort effect generalisation of M1 • Model M3 = Currie’s age-period-cohort model • Model M4 = P-splines model, Currie 2004 • Model M5 = CBD two-factor model, Cairns et al (2006), no cohort effect • Models M6, M7 and M8: alternative cohort-effect generalisations of CBD

  7. Second study, Cairns et al (2008) • Examines ex ante plausibility of models’ density forecasts • M4 (P-Splines not considered) • Amongst other conclusions, finds that M8 (which did very well in first study) gives very implausible forecasts for US data • Hence, decided to drop M8 as well • Thus, a model might fit past data well but still give unreliable forecasts •  Not enough just to look at past fits

  8. Third study, Dowd et al (2008a) • Examines the Goodness of Fits of models M1, M2B, M3B, M5, M6 and M7 more systematically • M2B is a special case of M2, which uses an ARIMA(1,1,0) for cohort effect • M3B is a special case of M3, which the same ARIMA(1,1,0) for cohort effect • Basic idea to unravel the models’ testable implications and test them systematically • Finds some problems with all models but M2B unstable

  9. Motivation for present study • A model might • Give a good fit to past data and • Generate density forecasts that appear plausible ex ante • And still produce poor forecasts • Hence, it is essential to test performance of models against subsequently realised outcomes • This is what backtesting is about • In the end, it is the forecast performance that really matters • Would you want to drive a car that hadn’t been field-tested?

  10. Backtesting framework • Choose metric of interest • Could choose mortality rates, survival rates, life expectancy, annuity prices etc. • Select historical lookback window used to estimate model params • Select forecast horizon or lookforward window for forecasts • Implement tests of how well forecasts subsequently performed

  11. Backtesting framework • We choose focus mainly on mortality rate as metric • We choose a fixed 10-year lookback window • This seems to be emerging as the standard amongst practitioners • We examine a range of backtests: • Over contracting horizons • Over expanding horizons • Over rolling fixed-length horizons • Future mortality density tests

  12. Backtesting framework • We consider forecasts both with and without parameter uncertainty • Parameter certain case: treat estimates of parameters as if known values • Parameter uncertain case: forecast using a Bayesian approach that allows for uncertainty in parameter estimates • Allows for uncertainty in parameters governing period and cohort effects • Results indicate it is very important to allow for parameter uncertainty

  13. Contracting horizon BT: age 65

  14. Contracting horizon BT: age 75

  15. Contracting horizon BT: age 85

  16. Conclusions so far • Big difference between PC and PU forecasts • PU prediction intervals usually considerably wider than PC ones • M2B sometimes unstable • Now consider expanding horizon predictions …

  17. Prediction-Intervals from 1980: age 65

  18. Prediction-Intervals from 1980: age 75

  19. Prediction-Intervals from 1980: age 85

  20. Expanding PI conclusions • PC models have far too many lower exceedances • PU models have exceedances that are much closer to expectations • Especially for M1, M7 and M3B • Suggests that PU forecasts are more plausible than PC ones • Negligible differences between PC and PU median predictions • Very few upper exceedances

  21. Expanding PI conclusions • Too few upper exceedances, and two many median and lower exceedances •  some upward bias, especially for PC forecasts • This upward bias is especially pronounced for PC forecasts • Evidence of upward bias less clearcut for PU forecasts

  22. Rolling Fixed Horizon Forecasts • From now on, work with PU forecasts only • Assume illustrative horizon = 15 years • Now examine performance of each model in turn …

  23. Model M1

  24. Model M2B

  25. Model M3B

  26. Model M5

  27. Model M6

  28. Model M7

  29. Tentative conclusions so far • Rolling PI charts broadly consistent with earlier results • Some evidence of upward bias but not consistent across models or always especially compelling • M2B again shows instability

  30. Mortality density tests • Choose age (e.g., 65) and horizon (e.g., 15 years ahead) • Use model to project pdf (or cdf) of mortality rate 15 years ahead • Plot realised q on to pdf/cdf • Obtain associated p-value (or PIT value) • Reject if p is too far out in either tail

  31. Example: P-Values of Realised Mortality: Males 65, 1980 Start, Horizon = 26 Years Ahead

  32. Many ways to do this • For h=25 years ahead: 1 way • 1980-2005 only • For h=24 years ahead, 2 ways • 1980-2004, 1981-2005 • For h=23 years ahead, 3 ways • …. • For h=1 year ahead, 26 ways • 1980-1981, 1981-1982, …, 2004-2005

  33. Lots of cases to consider • The are 25+24+23+…+1=325 separate cases to consider, each equally ‘legitimate’ • Need some way to make use of all possibilities but consolidate results • We do so by computing p-values for each case and then work with mean p-values from each test • These are reported below for each age, for h=5, 10 and 15 years ahead:

  34. Age 65

  35. Age 75

  36. Age 85

  37. Conclusions from these tests • All models perform well • No rejections at 1% SL • Only 3 at 5% SL

  38. Overall conclusions • Study outlines a framework for backtesting forecasts of mortality models • As regards individual models and this dataset: • M1, M3B, M5 and M7 perform well most of the time and there is little between them • M2B unstable • Of the Lee-Carter family of models, hard to choose between M1 and M3B • Of the CBD family, M7 seems to perform best; little to choose between M5 and M7

  39. Two other points stand out • In many but not all cases, and depending also on the model, there is evidence of an upward bias in forecasts • This is very pronounced for PC forecasts • This bias is less pronounced for PU forecasts • Except maybe for M2B, PU forecasts are more plausible than the PC forecasts •  Very important to take account of param uncertainty more or less regardless of the model one uses

  40. References • Cairns et al. (2007) “A quantitative comparison of stochastic mortality models using data from England & Wales and the United States.” Pensions Institute Discussion Paper PI-0701, March • Cairns et al. (2008) “The plausibility of mortality density forecasts: An analysis of six stochastic mortality models.” Pensions Institute Discussion Paper PI-0801, April. • Dowd et al. (2008a) “Evaluating the goodness of fit of stochastic mortality models.” Pensions Institute Discussion Paper PI-0802, September. • Dowd et al. (2008b) “Backtesting stochastic mortality models: An ex-post evaluation of multi-year-ahead density forecasts.” Pensions Institute Discussion Paper PI-0803, September. • These papers are also available at www.lifemetrics.com

More Related