Statistical Data Analysis: Lecture 10

Statistical Data Analysis: Lecture 10 1 Probability, Bayes’ theorem, random variables, pdfs 2 Functions of r.v.s, expectation values, error propagation 3 Catalogue of pdfs 4 The Monte Carlo method 5 Statistical tests: general concepts 6 Test statistics, multivariate methods 7 Significance tests 8 Parameter estimation, maximum likelihood 9 More maximum likelihood 10 Method of least squares 11 Interval estimation, setting limits 12 Nuisance parameters, systematic uncertainties 13 Examples of Bayesian approach 14 tba Lectures on Statistical Data Analysis

The method of least squares Suppose we measure N values, y1, ..., yN, assumed to be independent Gaussian r.v.s with Assume known values of the control variable x1, ..., xN and known variances We want to estimate θ, i.e., fit the curve to the data points. The likelihood function is Lectures on Statistical Data Analysis

The method of least squares (2) The log-likelihood function is therefore So maximizing the likelihood is equivalent to minimizing Minimum defines the least squares (LS) estimator Very often measurement errors are ~Gaussian and so ML and LS are essentially the same. Often minimize χ2 numerically (e.g. program MINUIT). Lectures on Statistical Data Analysis

LS with correlated measurements If the yi follow a multivariate Gaussian, covariance matrix V, Then maximizing the likelihood is equivalent to minimizing Lectures on Statistical Data Analysis

Example of least squares fit Fit a polynomial of order p: Lectures on Statistical Data Analysis

Variance of LS estimators In most cases of interest we obtain the variance in a manner similar to ML. E.g. for data ~ Gaussian we have and so 1.0 or for the graphical method we take the values of θ where Lectures on Statistical Data Analysis

Two-parameter LS fit Lectures on Statistical Data Analysis

Goodness-of-fit with least squares The value of the χ2 at its minimum is a measure of the level of agreement between the data and fitted curve: It can therefore be employed as a goodness-of-fit statistic to test the hypothesized functional form λ(x; θ). We can show that if the hypothesis is correct, then the statistic t = χ2min follows the chi-square pdf, where the number of degrees of freedom is nd = number of data points - number of fitted parameters Lectures on Statistical Data Analysis

Goodness-of-fit with least squares (2) The chi-square pdf has an expectation value equal to the number of degrees of freedom, so if χ2min≈ nd the fit is ‘good’. More generally, find the p-value: This is the probability of obtaining a χ2min as high as the one we got, or higher, if the hypothesis is correct. E.g. for the previous example with 1st order polynomial (line), whereas for the 0th order polynomial (horizontal line), Lectures on Statistical Data Analysis

Goodness-of-fit vs. statistical errors Lectures on Statistical Data Analysis

Goodness-of-fit vs. stat. errors (2) Lectures on Statistical Data Analysis

LS with binned data Lectures on Statistical Data Analysis

LS with binned data (2) Lectures on Statistical Data Analysis

LS with binned data — normalization Lectures on Statistical Data Analysis

LS normalization example Lectures on Statistical Data Analysis

Using LS to combine measurements Lectures on Statistical Data Analysis

Combining correlated measurements with LS Lectures on Statistical Data Analysis

Example: averaging two correlated measurements Lectures on Statistical Data Analysis

Negative weights in LS average Lectures on Statistical Data Analysis

Wrapping up lecture 10 Considering ML with Gaussian data led to the method of Least Squares. Several caveats when the data are not (quite) Gaussian, e.g., histogram-based data. Goodness-of-fit with LS “easy” (but do not confuse good fit with small stat. errors) LS can be used for averaging measurements. Next lecture: Interval estimation Lectures on Statistical Data Analysis

Statistical Data Analysis: Lecture 10

Statistical Data Analysis: Lecture 10

Presentation Transcript

Analysis of Longitudinal Data Continuous Response: Part 1

Multivariate Data Analysis Using SPSS

Regression: Data Analysis

Statistical Evaluation of Data

Environmental Data Analysis with MatLab

Statistical Inference and Regression Analysis: GB.3302.30

LING / C SC 439/539 Statistical Natural Language Processing

Statistical Inference and Regression Analysis: GB.3302.30

Environmental Data Analysis with MatLab

What statistical analysis should I use?

NUMERICAL ANALYSIS OF BIOLOGICAL AND ENVIRONMENTAL DATA

NUMERICAL ANALYSIS OF BIOLOGICAL AND ENVIRONMENTAL DATA

Lecture 5 Microarray Data Analysis Bioinformatics Data Analysis and Tools

Statistical met h od s for boson s

Data Envelopment Analysis

Statistical met h od s for boson s

Data Mining Cluster Analysis: Basic Concepts and Algorithms

Statistical inference for astrophysics

Data Processing and Data Analysis

Lecture 19 aCGH and Microarray Data Analysis Introduction to Bioinformatics

NUMERICAL ANALYSIS OF BIOLOGICAL AND ENVIRONMENTAL DATA

Multivariate Data Analysis Using SPSS