Chapter 4: Basic Estimation Techniques

Chapter 4: Basic Estimation Techniques

• • Slope parameter (b) gives the change in Y associated with a one-unit change in X, Simple Linear Regression • Simple linear regression model relates dependent variable Y to one independent (or explanatory) variable X

• • Method of Least Squares • The sample regression line is an estimate of the true regression line Dr. Chen’s notes: In statistics, the ordinary least square (OLS) is the most common method for estimates.

ei Sample Regression Line (Figure 4.2) S 70,000 Sales (dollars) • 60,000 • 50,000 • 40,000 • • 30,000 • 20,000 • 10,000 A 10,000 4,000 2,000 8,000 6,000 0 Advertising expenditures (dollars)

• • Unbiased Estimators • An estimator is unbiased if its average value (or expected value) is equal to the true value of the parameter Dr. Chen’s notes: For example, if E ( )= , then is an unbiased estimator of . In general, OLS estimators are unbiased. So, may not exactly equal the true intercept a; however, it is an unbiased estimator for a. Similarly, is an unbiased estimator for b. In the following few slides, let’s go over the MS Excel output on textbook pp. 127 to learn the procedure of regression.

Coefficient of Determination • R2 measures the percentage of total variation in the dependent variable that is explained (determined) by the regression equation • Ranges from 0 to 1; higher R2is better • If R20.7, then the regression equation is a “good fit”. That is, the equation fits the sample well Dr. Chen’s notes: On textbook pp. 127, did you see R Square = 0.7652 in Panel B-Microsoft Excel? It means that about 77% variation of sales is determined by the independent variable, advertising expenditure. Only 23% variation of sales is determined by the other variables not considered in the regression. In regression procedure, R2 is the first number to check. If it indicates the equation is not a good fit, we need to add indep. variables or change model to enhance the equation .

Correlation Coefficient • is the correlation coefficient which summaries the relationship b/w dependent variable and independent variable • Ranges from −1 to 1 • Its sign is the same as that of coefficient of independent variable • When  R  0.7, the correlation is strong Dr. Chen’s notes: On textbook pp. 127, did you see Multiple R = 0.8748? It is derived from the square root of R Square. The correlation coefficient is positive because the sign of coefficient of A is also positive. Advertising expenditures and sales are strongly positively related.

Statistical Significance • Test statistical significance for overall regression equation: using F-test • Test statistical significance for an independent variable: using t-test • P-value approach: • If p-value associated with the test stat is less than the desired level of significance , then we confirm the significance. If p-value>, the equation cannot predict the population well. Dr. Chen’s notes: More details in the following slides. Here, just learn why we need to test the significance. Even though a regression equation is a good fit, we can only conclude that the equation summarizes the sample well. We cannot make sure whether the model is good enough to predict the unknown population. Only when the significance is confirmed, then we can apply the results from sample to predict the unknown population.

F-Test • Used to test for significance of overall regression equation • If p-value of F-test (Significance F)is less than  (default 0.05), then the overall regression equation is statistically significant. Dr. Chen’s notes: Check the Excel output on textbook pp. 127 again. Did you see the Significance F = 0.0100? It is the p-value of F-test. Obviously, it is less than 0.05, the level of significance. We can conclude that the whole regression equation is statistically significant.

t-Test • Used to test for significance of an independent variable • If p-value of t-test is less than  (default 0.05), then the independent variable statistically significant. • For simple regression (i.e. one independent variable), both F-test and t-Test have the same p-value. Dr. Chen’s notes: Please check the Excel output on textbook pp. 127 again. You can see that Significance F and P-value of t-test for independent variable A are the same at 0.0100. When the p-value of t-test is less than  (0.05), A (advertising expenditure) is statistical significant and its relationship to sales can be applied to predict the unknown population.

Testing Procedure • Dr. Chen’s notes: • Let us review the significance testing procedure. After you run MS Excel “Regression”, given the output, you are supposed to follow the steps: • Check R Square: A good fit or not; • Check Significance F: Overall significance; and • Check P-Value of t-test for each independent variable: Independent variable significance. • If the regression equation is not a good fit (i.e. R2 is small), then the equation doesn’t fit the sample data well. You need to add more independent variables, try the other models (other than linear model) or collect another sample for a better fit. Two more models will be introduced later. Without a high R2, the regression makes less sense. In general, if R2 is high, the overall significance will be also confirmed. One common problem for multiple regression (more than one indep. Variable) is that one independent variable significance (t-test) is not confirmed but the overall significance (F-test) is confirmed. In this case, you need to drop the “troubled” indep. variable then replace it by a new or more indep. variables. Finally, when all the three steps are confirmed, you will obtain the regression equation which not only fits the sample well but also serves as a useful tool to forecast the unknown population.

Multiple Regression • Uses more than one explanatory (independent) variable • Coefficient for each explanatory variable measures the change in the dependent variable associated with a one-unit change in that explanatory variable Dr. Chen’s notes: From pp. 140 to 148, the textbook introduces multiple linear, quadratic, and log-linear (exponential) models. Given any sample data, please always try the linear model first. If it is not a good fit, then try the rest of models.

is U-shapedor U -shaped • • • Quadratic Regression Models • Use when curve fitting scatter plot Dr. Chen’s notes: For MS Excel operation, you just need to create one more column from X to be X2, then run “Regression” with the two columns (X and X2) as independent variables. If R2 is improved (higher), then the quadratic model fits better.

• • • • • Log-Linear Regression Models

• Log-Linear Regression Models • Dr. Chen’s notes: • The advantages of log-linear (exponential) model are • It summarizes almost all kinds of nonlinear relationship b/w dep. and indep. variables; • The coefficients of independent variables (log value of original variables), b and c, are the “elasticity” of the independent variable. For example, in the above, if Y = quantity demanded, X = price, and Z = income, then b will be the price elasticity of demand and c will be income elasticity. In Ch 7, we will continue to talk about them. • In Excel operation, you should calculate the logarithm value for each variable by ln function first. Then build the multiple linear model.

Chapter 4: Basic Estimation Techniques

Chapter 4: Basic Estimation Techniques

Presentation Transcript

Chapter 7: Nucleic Acid Amplification Techniques

Project Estimation and scheduling

Chapter 8 Motivation: Cognitive and Behavioral Theories and Techniques

Estimation taking account of sample selection with Stata

Chapter 15

Python Programming Workshop

Chapter 7: Nucleic Acid Amplification Techniques

Chapter 3 - Basic Attending and Listening Skills

Section 2 Drafting Techniques and Skills

Chapter 3 The Cells: The Raw Materials and Building Blocks

AoPS:

Chapter 4: SQL

Chapter 4

Chapter 12 Power Amplifiers

Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 8 —

Chapter 7

Chapter 11: Basic I/O Interface

Chapter 11: Basic I/O Interface

Systems Development and Documentation Techniques

CHAPTER 2 BASIC ELEMENTS OF C++

Chapter 1. Basic Structure of Computers

Chapter 2 The Simple Linear Regression Model: Specification and Estimation