240 likes | 331 Vues
STA 291 Summer 2010. Lecture 1 Dustin Lueker. Topics. Statistical terminology Descriptive statistics Probability and distribution functions Inferential statistics Estimation (confidence intervals) Hypothesis testing Simple linear regression and correlation. Why study Statistics?.
E N D
STA 291Summer 2010 Lecture 1 Dustin Lueker
Topics • Statistical terminology • Descriptive statistics • Probability and distribution functions • Inferential statistics • Estimation (confidence intervals) • Hypothesis testing • Simple linear regression and correlation STA 291 Summer 2010 Lecture 1
Why study Statistics? • Research in all fields is becoming more quantitative • Research journals • Most graduates will need to be familiar with basic statistical methodology and terminology • Newspapers, advertising, surveys, etc. • Many statements contain statistical arguments • Computers make complex statistical methods easier to use STA 291 Summer 2010 Lecture 1
Lies, Damn Lies, and Statistics • Many times statistics are used in an incorrect and misleading manner • Purposely misused • Companies/people wanting to further their agenda • Cooking the data • Completely making up data • Massaging the numbers • Altering values to get desired result • Accidentally misused • Using inappropriate methods • Vital to understand a method before using it STA 291 Summer 2010 Lecture 1
What is Statistics? • Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data • Applicable to a wide variety of academic disciplines • Physical sciences • Social sciences • Humanities • Statistics are used for making informed decisions • Business • Government STA 291 Summer 2010 Lecture 1
General Statistical Methodology STA 291 Summer 2010 Lecture 1
Basic Terminology • Population • Total set of all subjects of interest • Entire group of people, animals, products, etc. about which we want information • Elementary Unit • Any individual member of the population • Sample • Subset of the population from which the study actually collects information • Used to draw conclusions about the whole population STA 291 Summer 2010 Lecture 1
Basic Terminology • Variable • A characteristic of a unit that can vary among subjects in the population/sample • Ex: gender, nationality, age, income, hair color, height, disease status, state of residence, grade in STA 291 • Parameter • Numerical characteristic of the population • Calculated using the whole population • Statistic • Numerical characteristic of the sample • Calculated using the sample STA 291 Summer 2010 Lecture 1
Data Collection and Sampling Theory • Why take a sample? Why not take a census? Why not measure all of the units in the population? • Accuracy • May not be able to find every unit in the population • Time • Speed of response from units • Money • Infinite Population • Destructive Sampling or Testing STA 291 Summer 2010 Lecture 1
Example • University Health Services at UK conducts a survey about alcohol abuse among students • 200 of the students are sampled and asked to complete a questionnaire • One question is “have you regretted something you did while drinking?” • What is the population? • What is the sample? STA 291 Summer 2010 Lecture 1
‘Flavors’ of Statistics • Descriptive Statistics • Summarizing the information in a collection of data • Inferential Statistics • Using information from a sample to make conclusions/predictions about the population • Ex: using a sample statistic to estimate a population parameter STA 291 Summer 2010 Lecture 1
Example • The Current Population Survey of about 60,000 households in the United States in 2002 distinguishes three types of families: Married-couple (MC), Female householder and no husband (FH), Male householder and no wife (MH) • It indicated that 5.3% of “MC”, 26.5% of “FH”, and 12.1% of “MH” families have annual income below the poverty level • Are these numbers statistics or parameters? • The report says that the percentage of all “FH” families in the USA with income below the poverty level is at least 25.5% but no greater than 27.5% • Is this an example of descriptive or inferential statistics? STA 291 Summer 2010 Lecture 1
Scales of Measurement • Quantitative or Numerical • Variable with numerical values associated with them • Qualitative or Categorical • Variables without numerical values associated with them STA 291 Summer 2010 Lecture 1
Qualitative Variables • Ordinal • Disease status, company rating, grade in STA 291 • Ordinal variables have a scale of ordered categories, they are often treated in a quantitative manner (A = 4.0, B = 3.0, etc.) • One unit can have more of a certain property than another unit • Nominal • Gender, nationality, hair color, state of residence • Nominal variables have a scale of unordered categories • It does not make sense to say, for example, that green hair is greater/higher/better than orange hair STA 291 Summer 2010 Lecture 1
Quantitative Variables • Quantitative • Age, income, height • Quantitative variables are measured numerically, that is, for each subject a number is observed • The scale for quantitative variables is called interval scale STA 291 Summer 2010 Lecture 1
Example • A study about oral hygiene and periodontal conditions among institutionalized elderly measured the following • Nominal (Qualitative): Requires assistance from staff? • Yes • No • Ordinal (Qualitative): Plaque score • No visible plaque • Small amounts of plaque • Moderate amounts of plaque • Abundant plaque • Interval (Quantitative): Number of teeth STA 291 Summer 2010 Lecture 1
Example • A birth registry database collects the following information on newborns • Birth weight: in grams • Infant’s Condition: • Excellent • Good • Fair • Poor • Number of prenatal visits • Ethnic background: • African-American • Caucasian • Hispanic • Native American • Other • What are the appropriate scales? Quantitative (Interval) Qualitative (Ordinal, Nominal) STA 291 Summer 2010 Lecture 1
Importance of Different Types of Data • Statistical methods vary for quantitative and qualitative variables • Methods for quantitative data cannot be used to analyze qualitative data • Quantitative variables can be treated in a less quantitative manner • Height: measured in cm/in • Interval (Quantitative) • Can be treated at Qualitative • Ordinal: • Short • Average • Tall • Nominal: • <60in or >72in • 60in-72in STA 291 Summer 2010 Lecture 1
Other Notes on Variable Types • Try to measure variables as detailed as possible • Quantitative • More detailed data can be analyzed in further depth • Caution: Sometimes ordinal variables are treated as quantitative (ex: GPA) STA 291 Summer 2010 Lecture 1
Discrete Variables • A variable is discrete if it can take on a finite number of values • Gender • Nationality • Hair color • Disease status • Grade in STA 291 • Favorite MLB team • Qualitative variables are discrete STA 291 Summer 2010 Lecture 1
Continuous Variables • Continuous variables can take an infinite continuum of possible real number values • Time spent studying for STA 291 per day • 43 minutes • 2 minutes • 27.487 minutes • 27.48682 minutes • Can be subdivided into more accurate values • Therefore continuous STA 291 Summer 2010 Lecture 1
Examples • Number of children in a family • Distance a car travels on a tank of gas • % grade on an exam STA 291 Summer 2010 Lecture 1
Discrete or Continuous • Quantitative variables can be discrete or continuous • Age, income, height? • Depends on the scale • Age is potentially continuous, but usually measured in years (discrete) STA 291 Summer 2010 Lecture 1