620 likes | 715 Vues
Learn how to analyze heating oil sales using multiple regression with temperature and insulation as independent variables. Understand coefficient interpretation, categorical variables, hypothesis testing, significance, and factors affecting sales. Dive into ANOVA and hypothesis testing for car accidents.
E N D
Exam Feb 28: sets 1,2 • Set 1 due Thurs • Memo C-1 due Feb 14 • Free tutoring will be available next week Plan A: MW 4-6PM OR Plan B: TT 2-4PM VOTE for Plan A or Plan B Announce results Thurs
Kinderman Supplement • Ch 2: Multiple Regression • Ch 3: Analysis of Variance
MULTIPLE REGRESSION Kinderman, Ch 2
Example • Reference: Statistics for Managers • By Levine, David M; Berenson; Stephan • Second edition (1999) • Prentice Hall
Y = dependent variable = heating oil sales (gal) • X1 = Temperature (degrees) • X2 = Insulation (inches) • X1 and X2 are independent variables • Y = bo + b1X1 + b2X2 • Enter data to Excel • NOTE: If you can’t find Data Analysis, try Add-Ins
Y = 562 –5X1 –20X2 Bottom table: Coefficient Column
Interpret coefficients Intercept = bo = 562: If temp =0 and insulation = 0, heating oil sales = 562 • b1 = -5: For all homes with same insulation, each 1 degree increase in temperature should decrease heating oil sales by 5 gallons • b2 = -20: For all months with same temp, each additional 1 inch of insulation should decrease sales by 20 gallons
Categorical Variables • X = 0 or 1 • Example: 0 if male, 1 if female • Example: 1 if graduate, 0 if drop out • Example: 1 if citizen, 0 if alien • NOTE: not in this fuel oil example
Estimate sales if temp = 30, insulation = 6 • Y = 562 -5(30) – 20(6) = 292 gal
Standard Error = 26Top table • Interpret: Typical fuel oil sales were about 26 gal away from average fuel oil sales of other homes with same temp and insulation
COEFFICIENT OF MULTIPLEDETERMINATION • Top table, R square • Interpret: 96% of total variation in fuel oil sales can be explained by variation in temperature and insulation
Is there a relationship between all independent variables and dependent variables? • Ho: Null hypothesis: All coefficients = 0 Ho: NO Relationship H1: Alternative hypothesis: At least one coefficient is not zero H1: There is a relationship
Computer output: Sample data • Hypotheses: Population parameters • Ho: Parameters = 0, but sample data makes it appear that there is a relationship • Simple regression: Ho: zero slope vs H1: slope positive or slope negative
Exponents • 10-1= 0.1 • 10-2 =0.01
Decision Rule • Reject Ho if “Significance F” < alpha • Middle table • Fuel oil example: Significance F = 1.6E-09 • Excel: E = Exponent • 1.6E-09 = 1.6*10-9 =0.0000000016 • Approaches zero as limit
Significance F=p-value • Excel uses p-value only if t distribution • Significance F = probability F is greater than Sample F
Assume alpha = .05 • Since 0 < .05, reject Ho • We conclude there IS a relationship between fuel oil sales and the independent variables
Which independent variables seem to be important factors? • Ho: Temperature not important factor • H1: Temperature is important • Reject Ho if p-value < alpha • Bottom table: p-value column, X1 row • P-value = 1.6E-09, or zero • Reject Ho • Temp is important
Insulation • Ho: insulation unimportant • H1: insulation important • P-value = 1.9E-06, or zero • Reject Ho • Insulation important
Analysis of Variance (ANOVA) Kinderman, Ch 3
Hypothesis Testing • Ho: µ1 = µ2 = µ 3 • H1: Not all means are = • H1: There are differences among 3 populations • H1: Average number of accidents different depending on where you live
This course: manual calculations • If you used computer software, you could have as many populations as needed • Homework, exam: 3 populations • Computer: 4 or more populations • Ex: Ethnic classifications at CSUN
Sample Sizes • Column 1: n1 = number of drivers sampled from policyholders living in city = 3 • Column 2: n2 = sampled from suburban drivers = 3 • Col 3: n3 = sampled from rural = 3 • Number of rows of data • Kinderman example: Different sample sizes
n = n1 + n2 + n3 n =3 + 3 + 3 = 9
Hypotheses • Ho: Differences in sample means due to chance, but no differences if ALL drivers were included (Prop 103) • H1: Population means are different because city drivers have more accidents
SSB = Sum of Squares Between • Between 3 groups • Explained Variation • Here: Variation in number of accidents explained by where you live (city, suburb, rural) • If where you live did not affect accidents, we would expect SSB = 0 • Next slide: SSB formula
This example • SSB = 3(2-1.1)2+3(1-1.1)2 +3(.3-1.1)2 =4.2
MSB = Mean Square Between • MSB = SSB/2 • Note: OK for this course, but bigger problems would have bigger denominator • MSB = 4.2/2 = 2.1
SSE= Sum of Squared Error • Variation within group • Ex: Variation within group of city drivers • Unexplained variation • If every city driver had same number of accidents, we would expect SSE = 0 • Formula on next slide
(1-2)2 +(3-2)2 +(2-2)2 +(2-1)2 + (0-1)2 + (1-1)2 +(1-.3)2 + (0-.3)2 + (0-.3)2 =4.67
MSE = Mean Square Error Mean Square Within Next slide is formula for this course. Bigger problems have bigger denominator