1 / 36

360 likes | 548 Vues

Portrait Quadstone. Webinar: Scorecard Secrets. Starting in 15 minutes. Starting in 10 minutes. Starting in 5 minutes. Starting in 2 minutes. Starting now. Please join the teleconference call—any problems, support@quadstone.com. How to ask questions. Use Q&A (not Chat please):

Télécharger la présentation
## Portrait Quadstone

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**Portrait Quadstone**Webinar: Scorecard Secrets Starting in 15 minutes Starting in 10 minutes Starting in 5 minutes Starting in 2 minutes Starting now Please join the teleconference call—any problems, support@quadstone.com Issue 5.2-1**How to ask questions**Use Q&A (not Chat please): • Click on the Q&A Panel icon at the bottom-right of your screen: • Type in your question:**Webinar: Scorecard Secrets**• Presenter: Patrick Surry, VP Technology • Agenda: • Predictive modeling process • How do you assess a given model (scorecard)? • How do you pick the weights in the boxes? • How do you pick the boxes for each field? • How do you pick the fields? • Why use a scorecard (why boxes?), e.g. vs. ‘traditional’ regression**Predictive Modeling Process**• What is the business problem (what are we predicting)? • How is success measured (when is one model ‘better’ than another)? • What modeling approach to use? • Preprocessing: • Variable creation • Variable selection • Variable transformation • Core solver: fitting algorithm to generate “best” model • Postprocessing to transform model output into desired prediction (score) • Final model**Typical Business Problems in Marketing**(We’ll focus mainly on binary outcomes; approaches are similar for continuous case)**How do we measure success?**• The score given to a customer is equivalent to either: • The estimated probability of a binary outcome • The estimatedvalue of a continuous outcome • Sometimes we only care about performance at a cutoff score (e.g. a bank deciding to make a loan or not) • Sometimes we only care about ranking or classifying customers (e.g. outbound marketing wants to call customers most likely to buy first) • Sometimes we care about some quantitative measure of accuracy (e.g. bank wants to predict level of reserves to keep against future bad loans)**How good is a given model?**• Nominal non-parameteric measures: How good at a cutoff? • Two-by-two contingency tables • Information gain, chi-squared significance, Cramer’s V • Ranked non-parametric measures: How well-ordered? • Gini / RoC • Kolmogorov-Smirnov • Parametric measures: How accurate for each customer? • Divergence statistic • Maximum likelihood measures • Linear regression • Logistic regression • Probit regression • NB. Tend to choose what is mathematically tractable, not what’s business relevant • Luckily they’re typically highly correlated**Scorecard performance**Target rate = Accept rate = ( A + B ) / ( A + B + C + D ) Hit rate = Bad rate = B / ( A + B ) • Often can directly assign a financial value to each category Cutoff Count Bads C D A Goods B Score • Ranked metrics (Gini, KS) measure how well the score sorts goods to the right and bads to the left • Parametric metrics (R2, MLE) measure how accurate each prediction is**Scorecards**Issue 5.2-1**bins**fields scores (weights) What is a scorecard? Apply scorecard**Scorecard ingredients**• How do you assess a given scorecard (model)? • How do you pick the weights in the boxes? • How do you pick the boxes for each field? • How do you pick the fields? • Why a scorecard (why boxes?) – scorecard vs regression**What numbers in the boxes?**Issue 5.2-1**Linear model**• Linear model (multiple regression), perhaps with manually transformed variables y = w x + b (e.g. y is Response, x is Age, Income, w are coefficients, b is intercept) • Scorecard builder doesn’t implement this form • Even with continuous outcomes we use transformed inputs Inputs (x, y) Linear solver Output (w, b) Optimizer**Generalized linear model**• Generalized linear model (including link function, e.g. f() as log-odds) f(y) = w x + b y = f-1(w x + b) • Although the core is still linear, finding w to optimize the quality metric typically isn’t • Scorecard builder doesn’t implement this form Inputs (x, y) Non-linear solver Output Transformation & Rescaling Output (w, b) Optimizer Postprocessing**Generalized additive model**• Generalized additive model (arbitrary functions of the independent variables) f(y) = w F(x) + b • Scorecard builder uses very simple class of functions F(x): Piecewise constant fit of x to the observed outcome, or indicator variables • For continuous variables, we use this form without the link function • For binary variables, we always use the link function (“linear regression” just uses an approximation of the non-linear solution) Inputs (x, y) Variable Transformation (F) Non-linear or Linear approx. solver Output Transformation & Rescaling Output (w, b) Preprocessing Optimizer Postprocessing**Variable**Transformation Core Solver Output Transformation & Rescaling Core Solver • Choose weights (numbers in boxes) to maximize likelihood of observing actual outcomes (based on quality measure) • Solver window: controls optimization parameters • Singular value decomposition provides robust solution with correlated variables • Can still see sensitivity with very small categories (though shouldn’t impact predictions unless those categories become large when scoring)**Variable**Transformation Core Solver Output Transformation & Rescaling Postprocessing • Model types: Risk, Response, Churn, Satisfaction • No change to ‘core’ statistics, just flipping signs and labels • Scaling of final score via two constants: • Even-odds point: log(odds) = 0 (50% likelihood) • Odds-doubling factor: log(odds) increment (e.g. +20 points double odds) • Core model always fits odds • Always ‘logistic’ form (except with ‘continuous’ model) • Prediction y is rescaled as Ay + B to give best logistic fit with desired scaling • “Linear regression” quality measure solves linear approximation to logistic**What boxes for each field?**Issue 5.2-1**Variable**Transformation Core Solver Output Transformation & Rescaling Variable transformation Generalized additive model: f(y) = w F(x) + b • What are the input variables xor F(x)? • In traditional regression, xare raw variables, or manually transformed, e.g. Income, log(Income) • In scorecard building, either: • One weight per bin: indicator (dummy) variables, one per bin in source variable, representing bin membership. • More fitting power (but also more free parameters) • One weight per field: Continuous variables, one per field, transformed from the source variable based on the outcome rate in each bin • Implicit transform (to the observed bad rate) gives a significant advantage over “standard” regression techniques**Optimized binning**• Maximize a measure of (categorical) association with the outcome • The default technique is a hierarchical merge • Similar to that used to generate a decision tree • Maximize the information gain at each step**Optimized binning**Target number of bins = 5 mean of objective field Age • Attempts to maximize univariate predictiveness, that is, minimize loss of predictiveness • Uses either iterative splitting (like decision tree) or exhaustive search**Summary: Scorecard vs Traditional (linear) regression**• scorecard is a generalized additive model based on indicator functions • model is thus piecewise constant in each of the independent variables • automatically captures non-linear relationships, more robust to outliers • increases in lift of 2% or more in real-world risk, response and retention modeling applications • simpler to build, understand & explain / socialize**What fields?**Issue 5.2-1**Scorecard Builder: stepwise inclusion / exclusion**• Why not use all available fields? • More fields typically increase training performance but risk overfit on test • Larger model is more difficult to explain & socialize • Build time scales by square of number of (transformed) variables • Build a set of trial scorecards: • Include a candidate field • Build a scorecard • Compute the quality measure • Include the next field… • Choose the field that creates the best trial scorecard • Linear Fit uses residuals to compute the marginal sum-of-squares error • Quality Measure uses a hybrid • Similar to traditional -score technique**“Right-Size” Scorecard**• Select point at which model quality exhibits diminishing returns on test data • Generate test/training split • Build logistic model on training data using remaining variables (initially all) • Measure quality (gini measure) when applied to test data • Exclude least contributory variable (based on training data) • Repeat to step 2 until no variables remain • Choose last point where test-set performance increases by minimum threshold • Refit model with selected number of variables using all data**Optimize**Binnings Variable Reduction Right-size Model Final Model (Optional) Parameter overrides Analytic Dataset Automating scorecard building workflow • Optimized binning of each independent variable: “best” piecewise constant transformation • Variable reduction using recursive step-wise exclusion • Model “right-sizing” by seeking point of diminishing returns on test-data • Many variables (1000s) require automated tools to help focus effort • Time is money • To perform same steps by hand would take several days (or weeks) • Automation completes in minutes**Further Reading**• Generalized additive models (GAM) • Generalized linear models (GLM) • Gini • ROC • Kolmogorov-Smirnov, • Probit • Logit • Singular value decomposition (SVD) • McCullagh P, Nelder JA, Generalised Linear Models (2nd edition), Chapman and Hall 1989.**After the webinar**• These slides, and a recording of this webinar will be available via http://support.quadstone.com/info/events/webinars/ • Any problems or questions, please contact support@quadstone.com**Upcoming webinars**See http://support.quadstone.com/info/events/webinars/

More Related