1 / 51

Econ 3790: Business and Economics Statistics

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal Email: yuppal@ysu.edu. Lecture 1 — Schedule. Goals of the course Data and statistics Tabular methods for summarizing data Graphical methods for summarizing data. Why use Statistics?.

Télécharger la présentation

Econ 3790: Business and Economics Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal Email: yuppal@ysu.edu

  2. Lecture 1 — Schedule • Goals of the course • Data and statistics • Tabular methods for summarizing data • Graphical methods for summarizing data

  3. Why use Statistics? • To make sense of large amounts of data: • What are the demographics of Youngstown in 2000? • Have U.S. wages increased since 1975? • To test hypotheses: • Is demand curve downward sloping? • Are GDP and Saving Rate positively correlated? • To make predictions: • What might happen to savings behavior after a large tax cut?

  4. Data: Basic Definitions • Data: a set of measurements • Dataset: all data collected for one study • Element, or unit: an entity on which data are collected • Variable: a property or attribute of each unit • Observation: the values of all variables for one unit

  5. Data: Basic Definitions Variables Observation Element Names Stock Annual Earn/ Exchange Sales($M) Share($) Company AMEX 73.10 0.86 OTC 74.00 1.67 NYSE 365.70 0.86 NYSE 111.40 0.33 AMEX 17.60 0.13 Dataram EnergySouth Keystone LandCare Psychemedics Data Set

  6. Data: Scales of Measurement • Four scales of measurement: • Nominal, ordinal, interval, and ratio scales • Scale determines which methods of summarization and analysis are appropriate for any given variable

  7. Data: Scales of Measurement • Characteristic • Nominal, like a label or name for a characteristic • e.g., color: red, green, blue • race: black, Hispanic, white, Asian • binary: (male, female), (yes, no), (0, 1) • Ordinal, still a characteristic, but having a natural order • e.g., how was service?: poor, average, good

  8. Data: Scales of Measurement • Numeric • Interval scale • Numeric data showing the properties of ordinal data • e.g., SAT scores, Fahrenheit temperature • Ratio scale • Ordered, numeric data with real zero • e.g., income, distance, price, quantity • http://www.math.sfu.ca/~cschwarz/Stat-301/Handouts/node5.html

  9. Data: Other Classifications • Qualitative, or categorical: measures a quality • Quantitative: numeric values that indicate how much or how many • Cross-sectional: data collected at one point in time • Time series: data collected over several time periods • Panel or longitudinal: combination of cross-sectional and time series

  10. Data: Summary of Definitions Data Qualitative Quantitative Numerical Numerical Nonnumerical Nominal Ordinal Nominal Ordinal Interval Ratio

  11. Statistical Inference: Definitions • Population: the set of all elements of interest in a study • Sample: a subset of the population • Statistical Inference: the process of using data obtained from a sample to make estimates and test hypotheses about the characteristics of a population

  12. Statistical Inference: Process 1. Population consists of all tune-ups. Average cost of parts is unknown. 2. A sample of 50 engine tune-ups is examined. 3. The sample data provide a sample average parts cost of $79 per tune-up. 4. The sample average is used to estimate the population average.

  13. Descriptive Statistics: Definition • Descriptive statistics are the tabular, graphical, and numerical methods used to summarize data

  14. Descriptive Statistics: Common Methods • Some common methods: • Tabular • Frequency table (for one variable) • Crosstabulation, or crosstab (for more than one variable) • Graphical • Bar graph (for categorical variables) • Histogram (for interval- or ratio-scaled variables) • Scatterplot (for two variables) • Numerical • Mean (arithmetic average)

  15. Summarizing Qualitative Data • Frequency distribution • Relative frequency distribution • Bar graph • Pie chart • Objective is to provide insights about the data that cannot be quickly obtained by looking at the original data

  16. Distribution Tables • Frequency distribution is a tabular summary of the data showing the frequency (or number) of items in each of several non-overlapping classes • Relative frequency distribution looks the same, but contains proportion of items in each class

  17. Example 1: What’s your major?

  18. Summarizing Quantitative Data • Frequency Distribution • Relative Frequency Distribution • Dot Plot • Histogram • Cumulative Distributions

  19. Example 1: Go Penguins

  20. Example 1: Go Penguins

  21. Example 2: Rental Market in Youngstown • Suppose you were moving to Youngstown, and you wanted to get an idea of what the rental market for an apartment (having more than 1 room) is like • I have the following sample of rental prices

  22. Example: Rental Market in Youngstown • Sample of 28 rental listings from craigslist:

  23. Frequency Distribution • To deal with large datasets • Divide data in different classes • Select a width for the classes

  24. Frequency Distribution (Cont’d) • Guidelines for Selecting Number of Classes • Use between 5 and 20 classes • Datasets with a larger number of elements usually require a larger number of classes • Smaller datasets usually require fewer classes

  25. Frequency Distribution • Guidelines for Selecting Width of Classes • Use classes of equal width • Approximate Class Width =

  26. Frequency Distribution • For our rental data, if we choose six classes: • Class Width = (750-330)/6 = 70

  27. Relative Frequency • To calculate relative frequency, just divide the class frequency by the total Frequency

  28. Relative Frequency • Insights gained from Relative Frequency Distribution: • 32% of rents are between $539 and $609 • Only 7% of rents are above $680

  29. Histogram Histogram of Youngstown Rental Prices

  30. .35 .30 .25 .20 .15 .10 .05 0 Describing a Histogram • Symmetric • Left tail is the mirror image of the right tail • Example: heights and weights of people Relative Frequency

  31. .35 .30 .25 .20 .15 .10 .05 0 Describing a Histogram • Moderately Left or Negatively Skewed • A longer tail to the left • Example: exam scores Relative Frequency

  32. .35 .30 .25 .20 .15 .10 .05 0 Describing a Histogram • Moderately Right or Positively Skewed • A longer tail to the right • Example: hourly wages Relative Frequency

  33. .35 .30 .25 .20 .15 .10 .05 0 Describing a Histogram • Highly Right or Positively Skewed • A very long tail to the right • Example: executive salaries Relative Frequency

  34. Cumulative Distributions • Cumulative frequency distribution: • shows the number of items with values less than or equal to a particular value (or the upper limit of each class when we divide the data in classes) • Cumulative relative frequency distribution: • shows the proportion of items with values less than or equal to a particular value (or the upper limit of each class when we divide the data in classes) • Usually only used with quantitative data!

  35. Example 1: Go Penguins (Cont’d)

  36. Cumulative Distributions • Youngstown Rental Prices

  37. Crosstabulations andScatter Diagrams • So far, we have focused on methods that are used to summarize data for one variables at a time • Often, we are really interested in the relationship between two variables • Crosstabs and scatter diagrams are two methods for summarizing data for two (or more) variables simultaneously

  38. Crosstabs • A crosstab is a tabular summary of data for two variables • Crosstabs can be used with any combination of qualitative and quantitative variables • The left and top margins define the classes for the two variables

  39. Example: Data on MLB Teams • Data from the 2002 Major League Baseball season • Two variables: • Number of wins • Average stadium attendance

  40. Crosstab Frequency distribution for the wins variable Frequency distribution for the attendance variable

  41. Crosstabs: Row or Column Percentages • Converting the entries in the table into row percentages or column percentages can provide additional insight about the relationship between the two variables

  42. Crosstab: Row Percentages

  43. Crosstab: Column Percentages

  44. Crosstab: Simpson’s Paradox • Data in two or more crosstabulations are often aggregated to produce a summary crosstab • We must be careful in drawing conclusions about the relationship between the two variables in the aggregated crosstab • Simpsons’ Paradox: • In some cases, the conclusions based upon an aggregated crosstab can be completely reversed if we look at the unaggregated data

  45. Crosstab: Simpsons Paradox Frequency distribution for the wins variable Frequency distribution for the attendance variable

  46. Scatter Diagram and Trendline • A scatter diagram, or scatter plot, is a graphical presentation of the relationship between two quantitative variables • One variable is shown on the horizontal axis and the other is shown on the vertical axis • The general pattern of the plotted lines suggest the overall relationship between the variables • A trendline is an approximation of the relationship

  47. Scatter Diagram • A Positive Relationship: y x

  48. Scatter Diagram • A Negative Relationship y x

  49. Scatter Diagram • No Apparent Relationship y x

  50. Example: MLB TeamWins and Attendance

More Related