1 / 146

Introduction to Applied Statistics

Introduction to Applied Statistics. Xiaobo Sheng. Overview. CH 1 Introduction CH 2-3 Concepts, Descriptive Statistics of one variable CH 6-8 Probability, A few common probability distributions and models CH 9-13 Statistical Inference CH 15 Linear Regression.

iain
Télécharger la présentation

Introduction to Applied Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Applied Statistics Xiaobo Sheng

  2. Overview • CH 1 Introduction • CH 2-3 Concepts, Descriptive Statistics of one variable • CH 6-8 Probability, A few common probability distributions and models • CH 9-13 Statistical Inference • CH 15 Linear Regression

  3. Introduction • What is statistics? • A collection of numerical information • Or the branch of mathematics dealing with theory and techniques of collecting, organizing, and interpreting numerical information. (We will focus on the first definition)

  4. Why we need Statistics? Pepsi vs Coca Horse Racing Casino Game

  5. How do we deal with Statistics? • Input: Data Set (a collection of information) • Process: Data analysis(Making sense of a data set) • Output: Statistical Inference(Drawing conclusion about a population based on a sample from that population)

  6. A few basic definitions need to know Population: the group or collection of interest to us. Usually it will be very huge and messy. Sample : a subset of population. reasonable small and capable of being analyzed using statistical tools. And we use the observations in the sample to learn about the population. Example : income of teachers. Average age, etc.

  7. Descriptive statistic a number used to summarize information in a set of data values. varies by different problems. Variable : a particular piece of information Two types: quantitative variable : has numerical values that are measurements categorical variable : values can not be interpreted as numbers.

  8. Mean : average = Median( 5o percentile) divides an ordered list of values in half. Quartiles divide an ordered list of values into 4 groups of equal or approximately equal size.

  9. 1st quartile (25th percentile) at least three-fourths are greater than or equal to the first quartile 3rdquartile (75th percentile) at least three-fourths are less than or equal to the first quartile Page 49

  10. Range Difference between the largest and smallest values of a data set. Interquartile range Difference between the 3rd and 1st quartiles

  11. Standard Deviation use it to measure variation of values about the mean σpopulation standard deviation ssample standard deviation P82

  12. Lists, Tables, and Plots • Data list A listing of the values of a variable in a data set.

  13. Table

  14. Table: Usually values in table are ordered or sorted by certain standard. If not, we can use Excel to finish this process.

  15. Plots • Dot Plot

  16. Frequency Table

  17. Histogram

  18. Distribution • A description of how the values of the variable are positioned along an axis or number line. Symmetric Skewed to the left(negatively skewed) there is a concentration of relatively values, with some scatter over a range of smaller values. Skewed to the right(positively skewed) there is a concentration of relatively values, with some scatter over a range of larger values.

  19. Peak A major concentration of values.

  20. Unimodal distribution has one major peak • Bimodal has two major peaks • Multimodal has several major peaks

  21. Box plot

  22. Box graph

  23. CH4 • Scatterplot two-dimensional graphical display of two quantitative variables.

  24. Transformation of a variable a mathematical manipulation of each value of the variable. logarithmic transformation(common one) square root transformation power transformation

  25. Logarithmic transformation take the logarithm of each value of the variable.

  26. Further variables relationship analysis in ch.15 Homework

  27. Ch 15 Correlation, Regression • Study relationship between quantitative variables Linear Correlation Coefficient

  28. Mathematical Notation (1) Another form (2)

  29. Formal Definition Correlation Coefficient(Pearson’s correlation coefficient) A measure of linear association between two quantitative variables r has no unit, and takes value from -1 to 1.

  30. A correlation coefficient near 0 suggests there is little or no linear association between those two variables

  31. Example

  32. What exactly does the correlation coefficient measure? It measures the extent of clustering of plotted points about a straight line. A correlation coefficient that is large in absolute value suggests strong linear association between the two variables. A correlation coefficient that near zero suggests little linear association between the two variables.

  33. Can correlation coefficient be misleading? • Yes. We should always plot two quantitative variables to get a visual feel for their relationship. Then we can use the correlation coefficient to supplement the plot.

  34. r is 0.66. By itself, this correlation coefficient might suggest linear association between these two variables. But the figure itself suggests a curved relationship. A stronger linear relationship exists between life expectancy and the logarithm of per capita gross national product.(r = 0.84)

  35. Outlier • An observation that is far from the other observations.

  36. Simple Linear RegressionMethod of least squares

  37. Example

  38. Scatterplot

  39. Calculation table

  40. Scatterplot with least square line

  41. Intercept has no physical meaning here.

  42. Definition of Linear Regression • Simple linear regression refers to fitting a straight line model by the method of least squares and then assessing the model. Application: • Find out relationship between two quantitative variables • Can be used to predict future.

More Related