1 / 39

Lufthansa

Lufthansa. Outlier Detection Methods on Booking Data AGIFORS Reservation and Yield Management Study Group Bangkok May 2001 Ulrich Oppitz . Definitions and Theory. Outlier Detection Methods. Analysis Method. Some Words on Quality Measurement. Results. Summary. Literature.

mervin
Télécharger la présentation

Lufthansa

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lufthansa Outlier Detection Methods on Booking Data AGIFORS Reservation and Yield Management Study Group Bangkok May 2001 Ulrich Oppitz

  2. Definitions and Theory Outlier Detection Methods Analysis Method Some Words on Quality Measurement Results Summary Literature Outlier Detection Methods on Booking Data- Agenda -

  3. Best practice for chain processes Booking data in RM systems can be influencedby many disturbances • Definition: Outliers are data points which differ in their appearance from the majority of the data. (Rousseeow, 1990) • Caused by: • system errors • schedule changes • special events • Two approaches to cope with outliers: • robust approach: • use robust methods/predictors • diagnostic approach: • identify outliers • trimm or ignore them • apply classical methods/predictors

  4. If ignored, outliers can affect the quality of the forecasting process significantly • To measure the robustness of a forecast method, Hodges introduced the term breakdown point. (Hodges 1967) • The breakdown point can be loosely defined as the smallest fraction of outliers that seriously offsets the estimator from the true one. (Rousseeuw 1991) • The breakdown point of any regression method based on the least squares technique is 1/n, which means a single outlier in a set of n data points can degenerate the LS estimate.

  5. Outlier Detection Methods Outlier Detection Methods on Booking Data- Agenda - Definitions and Theory Analysis Method Some Words on Quality Measurement Results Summary Literature

  6. Z-Score Testing • calculate empirical average  and variance  based on historical bookings for each DCP • check whether number of historical bookings > minimum observations • tag as outlying if outside the following interval upper threshold:  + maxSigmaPos *  lower threshold:  - maxSigmaNeg *  • trimm outlying data to threshold value before updating  and 

  7. Z-Score Testing 0,15 0,1 density function of normal distribution 0,05 0  upper bound lower bound bkgs

  8. Determination Coefficient Testing on Residual Regression • update exponentially smoothed bookings for each dcp -> reference curve • check whether number of historical bookings > minimum observations • calculate residuals bkd(dcp) from actual bookings and reference curve • calculate linear regression curve reg(dcp) on residuals bkd(dcp)

  9. Determination Coefficient Testing on Residual Regression reg(dcp) bkd(dcp) dcp

  10. (reg (dcp) - reg) 2 2 R = (bkd (dcp) - bkd ) 2 Determination Coefficient Testing on Residual Regression • calculate the determination coefficient • if R2 < minR2 tag dcp with largest vertical distance to regression curve as outlying and take it out of the set • iterate with cleaned data set • stop if R2 > minR2 or number of outlier > maxOutlier • reset outlier taggings if more than maxOutlier

  11. Outlier Detection Methods on Booking Data- Agenda - Definitions and Theory Outlier Detection Methods Analysis Method Some Words on Quality Measurement Results Summary Literature

  12. The simulation is performed on real booking data • 42 flight numbers (2 multi-leg flights) • data type: actual bookings • data source: PROS IV data base • departure time range: 01Jun94 - 31May97 • booking classes: FA CDZ HBLGYKTWE • evaluated DCPs: 1-15 • total flight departes: 422 054 • total DCPs: 6 330 810

  13. Analysis method: artificial outlier implantation • 1) Preprocessing: outlier cleaning with very conservative parameters (high outlier tagging rates) • 2) Different manipulations are performed with predefined probabilities • XLA enlarge all DCPs x 3.00 PXLA = 0.01 • XSA shrink all DCPs x 0.33 PXSA = 0.01 • XL1 enlarge single DCP x 3.00 PXL1 = 0.01 • XS1 shrink single DCP x 0.33 PXS1 = 0.01 • X-Y swap booking classes X and Y PX-Y = 0.02 • 3) Artificially created outliers are tagged. • 4) Apply outlier detection method • 5) Evaluation: count number of recognized outliers and non-outliers

  14. Outlier Detection Methods on Booking Data- Agenda - Definitions and Theory Outlier Detection Methods Analysis Method Some Words on Quality Measurement Results Summary Literature

  15. The quality measures known in the literatureare not sufficient in the RM environment. • observables: True Positives TP • True Negatives TN • False Positives FP • False Negatives FN • TP • sensitivity1: TP + FN =: sens (masking) • TN • specificity1: TN + FP =: spec (swamping) • TP + TN • efficiency1: TN + FN + TP + FP =: eff • TP + FP • temperament: TN + FN + TP + FP =: temp 1 (Walczak, 1998)

  16. Quality Measures for Outlier Detection Methods • For an outlier detection method on booking data it is most important to detect almost all outliers. Few data points which are erroneously taken out of the valid set, have less impact. • weighting of error types TP and TN • dynamical adaption of weights to degree of contamination • axioms for a quality measure  let A,B  Â denote the complex set of correct classifications, 0 <= (A) <= 1 (A) = 0  A =  (A) = 1  A= Â A  B  (A) < (B) ( AB) = (A) + (B) - (AB)

  17. Contamination and Temperament Weighted Efficiencymeet the conditions • TN + (1- )TP •  (TN+FP) + (1- ) (TP+FN) • TP + FN • TN + FN + TP + FP (outlier rate) • TN + (1- )TP •  (TN+FP) + (1- ) (TP+FN) • TP + FP • TN + FN + TP + FP (temperament) CWE = with  = TWE = with  =

  18. Outlier Detection Methods on Booking Data- Agenda - Definitions and Theory Outlier Detection Methods Analysis Method Some Words on Quality Measurement Results Summary Literature

  19. Sensitivity Analysis on Cleaned Booking Data- temperament for z-score testing - temperament, z-score testing

  20. Sensitivity Analysis on Cleaned Booking Data- sensitivity for z-score testing - sensitivity, z-score testing

  21. Sensitivity Analysis on Cleaned Booking Data- specificity for z-score testing - specificity, z-score testing

  22. Sensitivity Analysis on Cleaned Booking Data- efficiency for z-score testing - efficiency, z-score testing

  23. Sensitivity Analysis on Cleaned Booking Data- contamination weighted efficiency for z-score testing - CWE, z-score testing Max: (0.9, 0.6, 0.924243)

  24. Sensitivity Analysis on Cleaned Booking Data- temperament weighted efficiency for z-score testing - TWE, z-score testing Max: (0.9, 0.6, 0.929789)

  25. max outlier min R2 Sensitivity Analysis on Cleaned Booking Data- temperament for DCT - temperament, DCT

  26. max outlier min R2 Sensitivity Analysis on Cleaned Booking Data- sensitivity for DCT - sensitivity, DCT

  27. max outlier min R2 Sensitivity Analysis on Cleaned Booking Data- specificity for DCT - specificity, DCT

  28. max outlier min R2 Sensitivity Analysis on Cleaned Booking Data- efficiency for DCT - efficiency, DCT

  29. max outlier min R2 Sensitivity Analysis on Cleaned Booking Data- contamination weighted efficiency for DCT - CWE, DCT Max: (0.45, 14, 0.736911)

  30. max outlier min R2 Sensitivity Analysis on Cleaned Booking Data- temperament weighted efficiency for DCT - TWE, DCT Max: (0.5, 14, 0.788245)

  31. Raw data analysis delivers more realistic results • Optimal Parameters on Cleaned and Raw Booking Data • z-score testing (ZST) • cleaned data raw data • CWE 0.9 / 0.6 -> 0.924 1.5 / 0.8 -> 0.680 • TWE 0.9 / 0.6 -> 0.930 2.2 / 0.9 -> 0.752 • determination coefficient testing (DCT) • cleaned data raw data • CWE 0.45 / 14 -> 0.737 0.70 / 14 -> 0.662 • TWE 0.50 / 14 -> 0.788 0.50 / 13 -> 0.747

  32. Comparison on raw data Proper parameter calibration is more important than method choice.

  33. Z-score testing on booking changes is more efficientthan on booking values. • Optimal Parameters on Raw Booking Data • z-score testing (ZST) • on bookings on booking changes • CTW 1.5 / 0.8 -> 0.680 1.8 / 1.1 -> 0.728 • DTW 2.2 / 0.9 -> 0.752 2.9 / 1.5 -> 0.820

  34. Outlier Detection Methods on Booking Data- Agenda - Definitions and Theory Outlier Detection Methods Analysis Method Some Words on Quality Measurement Results Summary Literature

  35. Outlier Detection Methods on Booking Data- Summary - • We defined new quality measures for outlier detection models which enable a parameter optimization and the comparison of different methods. • Symmetric acceptance ranges for z-score testing are of disadvantage • potential for improvement by only adjusting parameters • revenue impact unknown, but positive • low risk • Clear superiority of z-score testing on cleaned booking data • Slight superiority of z-score testing on raw booking data • Parameter optimization incorporates higher potential for improvement than choice of method. • Z-score testing can be improved if applied on booking changes

  36. Outlier Detection Methods on Booking Data- Agenda - Definitions and Theory Outlier Detection Methods Analysis Method Some Words on Quality Measurement Results Summary Literature

  37. Outlier Detection Methods on Booking Data- Literature - • Hodges 1967 • J.L. Hodges, • Proc. Fifth Berkeleley Symp. Math. Stat. Probab., • 1967, 1, 163-168 • Rousseeuw 1987 • P.J. Rousseeuw, A.M. Lerroy, • Robust Regression and Outlier Detection, • Wiley, New York, 1987 • Rousseeuw 1990 • P.J. Rousseeuw, • Unmasking Multivariate Outliers and Leverage Points (with discussion), • Journal of the American Statistical Association, • 1990, 85, 633-651

  38. Outlier Detection Methods on Booking Data- Literature, ctd. - • Rousseeuw 1991, • P.J. Rousseeuw, • Journal of Chemometrics, • 1991, 5, 1-20 • Walczak 1998, • B. Walczak, D.L. Massart, • Multiple Outlier Detection Revisited, • Chemometrics and Intelligent Laboratory Systems, • 1998, 41, 1-15

  39. Lufthansa Outlier Detection Methods on Booking Data AGIFORS Reservation and Yield Management Study Group Bangkok May 2001 Ulrich Oppitz

More Related