1 / 14

The Three Rules of Data Analysis

The three rules of data analysis won’t be difficult to remember: Make a picture —things may be revealed that are not obvious in the raw data. These will be things to think about. Make a picture —important features of and patterns in the data will show up.

xena-gross
Télécharger la présentation

The Three Rules of Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The three rules of data analysis won’t be difficult to remember: Make a picture—things may be revealed that are not obvious in the raw data. These will be things to think about. Make a picture—important features of and patterns in the data will show up. Make a picture—the best way to tell others about your data is with a well-chosen picture. The Three Rules of Data Analysis

  2. We can “pile” the data by counting the number of data values in each category of interest. We can organize these counts into a frequency table, which records the totals and the category names. A relative frequency table is similar, but gives the percentages (instead of counts) for each category. Making Piles

  3. Frequency tables and relative frequency tables describe the distribution of a categorical variable by naming the possible categories and telling how frequently each occurs. The following are the frequency and relative frequency tables for the Titanic data: What Do Frequency Tables Tell Us?

  4. You might think that a good way to show the Titanic data is with this display: What’s Wrong With This Picture?

  5. The ship display violates the area principle: the area occupied by a part of the graph should correspond to the magnitude of the value it represents. Thus, a better display is: The Area Principle

  6. When you are interested in parts of the whole, a pie chart might be your display of choice. Pie charts show the whole group of cases as a circle. They slice the circle into pieces whose size is proportional to the fraction of the whole in each category. A Slice of the Pie

  7. A contingency table allows us to look at two categorical variables together. Example: we can examine the class of ticket and whether a person survived the Titanic: The totals in the margins of the table give us the marginal distribution of the respective variables. Contingency Tables

  8. A distribution of one variable for only those individuals or cases satisfying some condition on another variable is called a conditional distribution. In a contingency table, variables are independent when the distribution of one variable is the same for all categories of another. Conditional Distributions

  9. Consider the following two pie charts from the text: These pie charts show the ticket class of the passengers conditional on survival status. We can see differences in the distributions—ticket class and survival are not independent. Conditional Distributions (cont.)

  10. A segmented bar chart displays the same information as a pie chart, but in the form of bars instead of circles. Here is the segmented bar chart for ticket class by survival status: Segmented Bar Charts

  11. Don’t violate the area principle. Keep it honest—make sure your display shows what it says it shows. Don’t confuse similar-sounding percentages—pay particular attention to the wording of the context. Be sure to use enough individuals! Don’t overstate your case—don’t claim something you can’t. What Can Go Wrong?

  12. Don’t use unfair or silly averages—this could lead to Simpson’s Paradox, so be careful when you average one variable across different levels of a second variable. What Can Go Wrong? (cont.)

  13. Categorical variables can be summarized in frequency or relative frequency tables. Categorical variables can be displayed with bar charts and/or pie charts—just make sure to follow the area principle. A contingency table summarizes two variables at a time. From a contingency table we can find the marginal distribution for each variable or the conditional distribution for one variable conditional on the other variable. Key Concepts

  14. Two categorical variables are said to be independent if the conditional distribution of one variable is the same for each category of the other. Beware of Simpson’s paradox—when averages are taken across different groups, they can appear to be contradictory. Key Concepts (cont.)

More Related