1 / 118

The General Linear Model

The General Linear Model. The Simple Linear Model. Linear Regression. Suppose that we have two variables. Y – the dependent variable (response variable) X – the independent variable (explanatory variable, factor). X , the independent variable may or may not be a random variable.

Télécharger la présentation

The General Linear Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The General Linear Model

  2. The Simple Linear Model Linear Regression

  3. Suppose that we have two variables • Y – the dependent variable (response variable) • X – the independent variable (explanatory variable, factor)

  4. X , the independent variable may or may not be a random variable . Sometimes it is randomly observed. Sometimes specific values of X are selected

  5. The dependent variable, Y, is assumed to be a random variable . The distribution of Y is dependent on X The object is to determine that distribution using statistical techniques. (Estimation and Hypothesis Testing)

  6. These decisions will be based on data collected on both variable Y (the dependent variable) and X (the independent variable) . Let (x1, y1), (x2, y2), … ,(xn, yn) denote n pairs of values measured on the independent variable (X) and the dependent variable (Y) The scatterplot: The graphical plot of the points: (x1, y1), (x2, y2), … ,(xn, yn)

  7. Assume that we have collected data on two variables X and Y. Let (x1, y1) (x2, y2) (x3, y3) … (xn, yn) denote thepairs of measurements on the on two variables X and Y for n cases in a sample (or population)

  8. The assumption will be made that y1,y2, y3 …, yn are • independent random variables. • Normally distributed. • Have the common variance, s. • The mean of yiis mi = a+ bxi Data that satisfies the assumptions above is to come from the Simple Linear Model

  9. Each yi is assumed to be randomly generated from a normal distribution with mean mi = a + bxi and standard deviation s. yi s a + bxi xi

  10. When data is correlated it falls roughly about a straight line.

  11. The density of yi is: The joint density of y1,y2, …,yn is:

  12. Estimation of the parameters the intercept a the slope b the standard deviation s (or variance s2)

  13. The Least Squares Line Fitting the best straight line to “linear” data

  14. Let Y = a + b X denote an arbitrary equation of a straight line. a and b are known values. This equation can be used to predict for each value of X, the value of Y. For example, if X = xi (as for the ith case) then the predicted value of Y is:

  15. Define the residual for each case in the sample to be: The residual sum of squares (RSS) is defined as: The residual sum of squares (RSS) is a measure of the “goodness of fit of the line Y = a + bX to the data

  16. One choice of a and b will result in the residual sum of squares attaining a minimum. If this is the case than the line: Y = a + bX is called the Least Squares Line

  17. To find the Least Squares estimates, a and b, we need to solve the equations: and

  18. Note: or and

  19. Note: or

  20. and Hence the optimal values of a and b satisfy the equations: From the first equation we have: The second equation becomes:

  21. Solving the second equation for b: and where and

  22. and Note: Proof

  23. Summary: Slope and intercept of the least squares Line and

  24. Maximum Likelihood Estimation of the parameters the intercept a the slope b the standard deviation s

  25. Recall The joint density of y1,y2, …,yn is: = the Likelihood function

  26. the log Likelihood function To find the maximum Likelihood estimates of a,band swe need to solve the equations:

  27. becomes becomes These are the same equations for the least squares line which have solution:

  28. The third equation: becomes

  29. Summary: Maximum Likelihood Estimates and

  30. and A computing formula for the estimate of s2 Hence

  31. Now Hence

  32. It also can be shown that Thus , the maximum likelihood estimator of s2, is a biased estimator of s2. This estimator can be easily converted into an unbiased estimator of s2 by multiply by the ratio n/(n – 2)

  33. Estimators in Linear Regression and

  34. The major computation is :

  35. Computing Formulae:

  36. Application Of Statistical Theory to simple Linear Regression We will now use statistical theory to prove optimal properties of the estimators. Recall, the joint density of y1,y2, …,yn is:

  37. Also and Thus all three estimators are functions of the set of complete sufficient statistics. If they are also unbiased then they are Uniform Minimum Variance Unbiased (UMVU) estimators (using the Lehmann-Scheffe theorem)

  38. We have already shown that s2 is an unbiased estimator of s2. We need only show that: and are unbiased estimators of band a.

  39. Thus are unbiased estimators of band a.

  40. Also Thus

  41. The General Linear Model

  42. Consider the random variable Y with 1. E[Y] = b1X1+ b2X2 + ... + bpXp (alternatively E[Y] = b0+b1X1+ ... + bpXp, intercept included) and 2. var(Y) = s2 • where b1, b2 , ... ,bp are unknown parameters • and X1 ,X2 , ... , Xp are nonrandom variables. • Assume further that Y is normally distributed.

  43. Thus the density of Y is: f(Y|b1, b2 , ... ,bp, s2) = f(Y| , s2)

  44. Now suppose that n independent observations of Y, (y1, y2, ..., yn) are made corresponding to n sets of values of (X1 ,X2 , ... , Xp) - (x11 ,x12 , ... , x1p), (x21 ,x22 , ... , x2p), ... (xn1 ,xn2 , ... , xnp). Then the joint density of y = (y1, y2, ... yn) is: f(y1, y2, ..., yn|b1, b2 , ... ,bp, s2) =

  45. Thus is a member of the exponential family of distributions And is a Minimal Complete set of Sufficient Statistics.

  46. Matrix-vector formulation The General Linear Model

More Related