1 / 9

CHAPTER 2

CHAPTER 2. 2.1 - Basic Definitions and Properties Population Characteristics = “Parameters” Sample Characteristics = “Statistics” Random Variables ( Numerical vs. Categorical ) 2.2, 2.3 - Exploratory Data Analysis Graphical Displays Descriptive Statistics

Télécharger la présentation

CHAPTER 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CHAPTER 2 2.1 - Basic Definitions and Properties Population Characteristics = “Parameters” Sample Characteristics = “Statistics” Random Variables (Numerical vs. Categorical) 2.2, 2.3 - Exploratory Data Analysis Graphical Displays Descriptive Statistics Measures of Center (mode, median, mean) Measures of Spread (range, variance, standard deviation)

  2. POPULATION – composed of “units” (people, rocks, toasters,...) Important Fact: To make certain calculations simpler, we assume that populations are “arbitrarily large” (or indeed, infinite). What do we want to know about this population? “Random Variable” X = any numerical value that can be assigned to each unit of a population “Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). [Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52.] There are two general types......... QuantitativeandQualitative • Quantitative [measurement] • length • mass • temperature • pulse rate • # puppies • shoe size 10½ 11 10

  3. CONTINUOUS (can take their values at any point in a continuous interval) DISCRETE (only take their values in disconnected jumps) POPULATION–composed of “units” (people, rocks, toasters,...) Important Fact: To make certain calculations simpler, we assume that populations are “arbitrarily large”(or indeed, infinite). What do we want to know about this population? “Random Variable” X = any numerical value that can be assigned to each unit of a population “Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). [Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52.] There are two general types......... QuantitativeandQualitative • Quantitative [measurement] • length • mass • temperature • pulse rate • # puppies • shoe size

  4. Qualitative [categorical] • video game levels (1, 2, 3,...) • income level(1 = low, 2 = mid, 3 = high) • zip code • ID # • color (Red, Green, Blue) ORDINAL, RANKED 1, “Success” 0, “Failure” X = POPULATION– composed of “units” (people, rocks, toasters,...) Important Fact: To make certain calculations simpler, we assume that populations are “arbitrarily large”(or indeed, infinite). What do we want to know about this population? “Random Variable” X = any numerical value that can be assigned to each unit of a population “Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). [Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52.] There are two general types......... QuantitativeandQualitative NOMINAL 1 2 3 • IMPORTANT CASE: • Binary (or Dichotomous) • Gender (Male / Female) • “Pregnant?” (Yes / No) • Coin toss (Heads / Tails) • Treatment (Drug / Placebo)

  5. Qualitative [categorical] • video game levels (1, 2, 3,...) • income level(1 = low, 2 = mid, 3 = high) • zip code • ID # • color (Red, Green, Blue) ORDINAL, RANKED 1, “Success” 0, “Failure” X = Example: Excel file of patient blood types POPULATION – composed of “units” (people, rocks, toasters,...) Another way… define X using “indicator variables”: Important Fact: To make certain calculations simpler, we assume that populations are “arbitrarily large”(or indeed, infinite). Note that I1 + I2 + I3 = 1 What do we want to know about this population? “Random Variable” X = any numerical value that can be assigned to each unit of a population “Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). [Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52.] There are two general types......... QuantitativeandQualitative NOMINAL 1 2 3 • IMPORTANT CASE: • Binary (or Dichotomous) • Gender (Male / Female) • “Pregnant?” (Yes / No) • Coin toss (Heads / Tails) • Treatment (Drug / Placebo) Note that each patient row sums to 1, i.e., O + A + B + AB = 1.

  6. “Population Distribution of X” (somewhat idealized) “Population Distribution of X” (somewhat idealized) X X POPULATION –composed of “units” (people, rocks, toasters,...) Important Fact: To make certain calculations simpler, we assume that populations are “arbitrarily large”(or indeed, infinite).   “Random Variable” X = any numerical value that can be assigned to each unit of a population “Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). [Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52.] There are two general types......... QuantitativeandQualitative Population “standard deviation”   Population mean  (“mu”) and  (“sigma”) are examples of parameters – nonrandom “population characteristics” whose exact values cannot be directly measured, but can (hopefully) be estimated from known “sample characteristics” – statistics.

  7. “Population Distribution of X” (somewhat idealized) X POPULATION–composed of “units” (people, rocks, toasters,...) Random variable X (Example: X = Age)   How do we infer information about the population variable X? = value of X for 1st individual x1 x3 = value of X for 2nd individual x2 x6 x4 …etc…. x5 xn SAMPLE of size n

  8. “Population Distribution of X” (somewhat idealized) X POPULATION –composed of “units” (people, rocks, toasters,...) Random variable X (Example: X = Age)   “Parameter Estimation” “Statistical Inference” x1 x3 x1 + x2 + x3 + x4 + x5 + x6 + … + xn x2 x6 x4 n …etc…. x5 xn Sample mean An example of a statistic SAMPLE of size n x = x1 x2 x4 x5 xn x3 x6 There are many potential random samples of a fixed size n, each with its own estimate of µ. It will eventually become important to understand the structure of their variability.

  9. Statistics are numerical values that are culled from a random sample of measurements taken from a specific population, in an effort to “summarize” its overall distribution, and estimate certain parameters (i.e., numerical characteristics) of that population. • Statistics – as a discipline – consists of a collection of formal testing procedures, designed to infer a conclusion regarding a specific hypothesis about the population, based on the sample data. • Statistics is sometimes referred to as the “search for sources of random variation” in a system. How much of a signal is genuinely significant information to be detected, and how much is random “noise”? • The “classical scientific method” provides a general framework for conducting formal statistical analysis.

More Related