Lecture 5: Non-Linear Patterns

Lecture 5:Non-Linear Patterns January 27, 2014

Question Assume you’re a monopolist that sells widgets to price sensitive consumers. If I told you the price elasticity of the consumers and your marginal cost, could you find the optimal price to charge? • Sure no problem • Maybe… if you gave me a few days. • I’ve heard those words before but no idea what they mean. • What’s elasticity? Is it like a rubber band?

Administrative • Problem Set 2 due at 9am • How was it? • Problem Set 3, due 1 week from today. Longer. • Quiz 2 Wednesday • Exam 1 in two weeks • Questions?

Quiz Results • Common mistakes: • 1st/3rd Quartile • norm.inv( ) • “cases” in the data • (p 12 of the textbook) • Solutions are online. • Scores will be on Blackboard (check later – they were online earlier but with a slight mistake)

Last time • Intro to non-linear patterns

Estimating the Model • Data: cars.csv • Cars data: MPG by Weight (1000’s of lbs) • The fitted line: Estimated MPG City = 43.3– 5.17 Weight • r2 = 0.702 and se = 2.95 • The equation estimates that mileage would increase by how much, on average, by reducing car weight by 200 lbs: • 1.0 MPG • 4.5 MPG • 2.9 MPG • I have no idea

Estimating the Model Cars data: MPG by Weight (1000’s of lbs) • It’s very easy to estimate an OLS regression model, but often a simple linear model isn’t appropriate. • Some times we can detect non-linearity with scatterplots. • In practice, it’s often hard to determine; especially if we start to consider outliers

Look at Plots of the Residuals • Nonlinear patterns are often easier to spot when looking at the residuals (residuals by x-values):

What to do? Transformations. • Create a new variable in the data set by applying a function to each observation • Two nonlinear transformations useful in many business applications: reciprocaland logarithms • Transformations allow the use of linear regression analysis to describe a curved pattern (sometimes) • Often the transformed variable makes more theoretical sense. • How to decide? • Use theory for insights. Often thinking about the data will tell you what you should do. • Try different ones. Iterate. • Among the possible choices, select the one that captures the curvature of the data and produces an interpretable equation

Choosing a Transformation • There are several suggested ones, depending on the curvature of your data: (but don’t forget to use context of the problem) • What was the shape of our MPG and Weight data?

Reciprocal Transformation • Reciprocal transformation is useful when dealing with variables that are already in the form of a ratio, such as miles per gallon • In the context of our car data, a reciprocal transformation makes sense: • Instead of miles per gallon, use gallons per mile. • But there aren’t many cars that burn more than one gallon per mile. So… • Transform the response variable (MPG  GPM)and multiply by 100. The resulting response is number of gallons it takes to go 100 miles

Reciprocal Transformation • Estimating the model with a transformed dependent variable: • EstimatedGallons/100 Miles = -0.112 + 1.204 Weight • r2= 0.713 se= 0.667

Residual Plot: • Outliers clear(er) – sports cars.

Comparing Models • Original linear model: • Estimated MPG City = 43.3– 5.17 Weight • r2 = 0.703 and se = 2.95 • Model with transformed dependent variable: • Estimated Gallons/100 Miles = -0.112 + 1.204 Weight • r2 = 0.7132 se = 0.667 • Can be tempting to say that model 1 is about the same as 2 because it has a similar r2 (70.3.% of variation explained vs 71.3%). • Not a valid comparison. Don’t compare r2 between models that have different data! (i.e., observations or response variables)

Substantive Comparison • The reciprocal equation treats weights differently than the linear equation • In the reciprocal equation, differences in weight matter less as cars get heavier • Diminishing effect of changes in weight makes more sense than a constant decrease • Substantive knowledge / theory important! • Knowledge of market forces (economics) very important

Reciprocal Transformation • Visually, what we did:

Reciprocal Transformation • But what if we really wanted to predict MPG? • We do not stop with just fitting the linear regression model. • We can transform back to MPG • Estimated Gallons/100 Miles = -0.112 + 1.204 Weight • Given the above model, what is the MPG of a car that weighs 3,000lbs? • 28.7 • 27.73 • 3.49 • 34.9

Transformations • If you transform a variable and use it for prediction, don’t for get to transform it back • Happens on the exams all the time (and grades suffer)

Log Transformations • Another very useful transformation: logarithms • Useful for distributions with positive skew (long right tail) • Useful when the association between variables is more meaningful on a percentage scale. • Many things are on a log scale (once you think about them) • Price Elasticity of Demand • Percentage change in quantity demanded given a 1% change in price • Key to figuring out the optimal price to charge.

Log Transformations Data: pet_food.csv Estimated Sales Volume = 190,483 – 125,189*Price r2 = 0.83

Log Transformations • Residual Plot: systematic patterns easier to spot

Log Transformations • If we take the log of both independent and dependent variables (also called a log-log regression): Estimated log(Sales Volume) = 11.05 – 2.442 * log(Price) r2 = 0.955 Fitted line Residual Plot

Comparing the models • The linear model doesn’t do horribly, but the log model is better

Next time • More on transformations • I.e.,: log transformations (very useful and common) • Elasticity. • Exam 1 two weeks from today

Lecture 5: Non-Linear Patterns