Welcome to the PMSA 2007 Conference Marriott Harbor Beach, Ft Lauderdale, FL
Welcome to the PMSA 2007 Conference Marriott Harbor Beach, Ft Lauderdale, FL. Modeling Promotional Response with Kernel Methods. Paul DuBose, Ph.D. VP Analytics, Principled Strategies paul.dubose@principledstrategies.com. Outline. Introduction to Promotional Response Background
Welcome to the PMSA 2007 Conference Marriott Harbor Beach, Ft Lauderdale, FL
E N D
Presentation Transcript
Welcome to thePMSA 2007 Conference Marriott Harbor Beach, Ft Lauderdale, FL
Modeling Promotional Response with Kernel Methods Paul DuBose, Ph.D. VP Analytics, Principled Strategies paul.dubose@principledstrategies.com
Outline • Introduction to Promotional Response • Background • Primary issues • Modeling Promotional Response • Modeling technologies • Kernel methods
Promotional Response: Background • To understand the response to any promotional component • Need to model all major promotions simultaneously • Allows isolating the impact of specific marketing or sales activities • The major sales / marketing activities • Details • Samples • Professional Meetings • Professional advertising • Direct to consumer
Promotional Response: Background (continued) • Key HCP attributes to model promotional response • Rx related variables • Number Rx in therapeutic class • TRx / NRx ratio • Rx distribution for competing products – information entropy • Payer access variables • % third party, % government and other summary stats • Census data • Median income in zip code, median rental cost in zip code • Segmentation information • Specialty
Issues: Correlation • How do you assign a value for the independent influence of two activities, e.g. samples and details? Rx Samples Details
Issues: Correlation (continued) • Problem is most severe in standard linear regression • High correlations leads to estimates with large confidence intervals • Symptom – model coefficients are of wrong sign • Model appears to fit the data but inaccurately estimates the result • Compromise to allow some bias but decrease the variance • Mean square error • Approach is called “regularization” in kernel methods technology & ridge regression in linear regression Unbiased with large variance Biased but small variance
Issues: Outliers • Outliers can heavily influence curve-fitting algorithms. With samples, group practice effects show a number of high-writing HCPs with zero samples Rx Samples
Issues: Sampling • There are two features to note about sampling activity that require it to be considered carefully • Sampling has different legal status • Excessive sampling is “buying business” and out of compliance • Sampling in excess can cause a decrease in Rx • Excessive detailing, DTC, and Professional Meetings do not cause a fall in Rx • Over-sampling cause a HCP to use samples in place of an Rx • Loss of Rx is called cannibalization
Kernel Methods • Characteristics • Uses “kernels” to create high dimensional and non-linear feature space (derived variables) • Training incorporates generalization derived from statistical learning theory • Sufficiently rich complexity to solve very difficult problems • Solution is computationally efficient • Power of this approach • Provides strong generalization properties • Significant improvement over linear regression confidence limits which depend on one pre-specified hypothesis • Searches an entire hypothesis space • A modern, powerful method that outperforms most other systems in a wide variety of applications
Kernel Methods: Intuition • Nonlinear pattern may appear linear in Feature Space • Not all input vectors are needed to support the final shape Support Vector o X X x x o X X O x X x O o x X O o o O O x O O o o Data Space Feature Space
Kernel Methods: Example 1 • Distinguish between two spirals: blue versus red • Circles are training data • Plus signs are test data • Kernel method classification accuracy • 100% on training data • 100% on test data • Linear regression accuracy • For y = f(1,x,y,x2,y2,x*y) • 49% on train data • 48% test data • Guessing would give an expected accuracy of 50% Spiral Model Results
Kernel Methods: Example 2 • Create a test case promotional response model • Compare performance of KM and Linear Regression • New Rx = Details Rx + Samples Rx; no noise
Kernel Methods: Example 2 (continued) • Details and Samples from bivariate normal • Correlation ( Details, Samples ) = 0.8
Kernel Methods: Example 2 (continued) • Linear Regression Model 1 Results • Correlation ( Predicted Rx, Actual Rx ) = 0.994; mean abs error = 0.333 • Prediction of sample and detail response has room for improvement • Predict for details when samples = 0; for samples when details = 0 Rx = f(S,S2,D,D2,S*D,D0.33)
Kernel Methods: Example 2 (continued) • Linear Regression Model 2 – Simplify model and explore Ridge Regression • Collinearity from samples and details may create problem • Notice several model coefficients change when “k”, the ridge parameter changes • Using cross-validation, the best predictive model occurs when k = 0.00 • So no need to use ridge regression in this particular case • Note that variables S*D and D2 are removed from model Rx = f(S,S2,D,D0.33)
Kernel Methods: Example 2 (continued) • Linear Regression Model 2 Results • Correlation ( Predicted Rx, Actual Rx ) = 0.971; mean abs error = 0.453 • Prediction of samples improves, but prediction of details is not as good • Predict for details when samples = 0; for samples when details = 0 Rx = f(S,S2,D,D0.33)
Kernel Methods: Example 2 (continued) • Linear Regression Model 3 Results • Correlation ( Predicted Rx, Actual Rx ) = 0.9997.; mean abs error = 0.178 • Prediction of samples improves, prediction of details good from 5 to 12 details • Ridge regression analysis showed no benefit for non-zero k parameter • Predict for details when samples = 0; for samples when details = 0 Rx = f(S,S2,D,D2,D0.33)
Kernel Methods: Example 2 (continued) • Kernel Methods model • Correlation ( Predicted Rx, Actual Rx ) = 0.9998; mean abs error = 0.062 • Prediction of response to samples and details close to actual data • Predict for details when samples = 0; for samples when details = 0
Kernel Methods: Example 2 (continued) • Kernel Methods model 2 – After tuning model parameters • Correlation ( Predicted Rx, Actual Rx ) = 1.0000; mean abs error = 0.003 • Prediction of response to samples and details a bit closer to actual data • Predict for details when samples = 0; for samples when details = 0
Kernel Methods: Example 2 (continued) • Linear Regression versus Kernel Methods model
Kernel Methods Explained • Support Vector Machine (SVM) • Most common kernel method • Provides a non-linear regression method • Select relevant input variables; select specific kernel • Kernel creates a very high dimensional feature space with non-linear transformations of the raw input data • The Gaussian kernel, also named radial basis function kernel, is the most frequently used kernel for numerical data with SVMs • The Kernel Matrix is of dimension n by n, where n is the number of observations and the i, jth element for a Gaussian kernel is of the form • K(Xi,Xj) = exp(-sigma*||Xi – Xj||2) • Solve dual of Lagrangian for the regression • Because of convexity, a solution is guaranteed!
Kernel Methods Explained (continued) • Dual Lagrangian formulated: • Prediction can be made as:
Kernel Methods Modularity Modular Stages of Kernel Methods: Data Kernel Method Κ(X,Z) Pattern Algorithm Pattern Function f(x) = ΣαiK(xi,x) • Polynomial • Gaussian • Support Vector Machine • Principal Components
Resources for Kernel Methods • Software • MatLab® is a good environment for kernel methods • A number of free machine learning software libraries are available including “Spider” • Weka is a good program to gain experience • Free program, go to http://www.cs.waikato.ac.nz/~ml/weka/ • Primarily useful for small data sets • Internet • www.kernel-machines.org • Book • Kernel Methods for Pattern Analysis by Shawe-Taylor and Christianina, ISBN 0 521 81397 2 Hardback, 2004