1 / 68

Statistics For Data Science | Statistics Using R Programming Language | Hypothesis Testing | Edureka

( ** Data Science Certification Using R: https://www.edureka.co/data-science ** ) <br>This Edureka tutorial on "Statistics for Data Science" talks about the basic concepts of Statistics, which is primarily an applied branch of mathematics, that attempts to make sense of observations in the real world. Statistics is generally regarded as one of the most crucial aspects of data science. <br><br>Introduction to statistics <br>Basic Terminology <br>Categories in Statistics <br>Descriptive Statistics <br>Reasons for moving to R <br>Descriptive Statistics in R Studio <br>Inferential Statistics <br>Inferential Statistics using R Studio <br><br>Check out our Data Science Tutorial blog series: http://bit.ly/data-science-blogs <br><br>Check out our complete Youtube playlist here: http://bit.ly/data-science-playlist

EdurekaIN
Télécharger la présentation

Statistics For Data Science | Statistics Using R Programming Language | Hypothesis Testing | Edureka

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Agenda Introduction to Statistics Terminology Categories in Statistics Descriptive & Inferential Statistics Statistics in R Descriptive Statistics in R Inferential Statistics in R

  2. Agenda Introduction to Statistics Terminology Categories in Statistics Descriptive & Inferential Statistics Statistics in R Descriptive Statistics in R Inferential Statistics in R

  3. Agenda Introduction to Statistics Terminology Categories in Statistics Descriptive & Inferential Statistics Statistics in R Descriptive Statistics in R Inferential Statistics in R

  4. Agenda Introduction to Statistics Terminology Categories in Statistics Descriptive & Inferential Statistics Statistics in R Descriptive Statistics in R Inferential Statistics in R

  5. Agenda Introduction to Statistics Terminology Categories in Statistics Descriptive & Inferential Statistics Statistics in R Descriptive Statistics in R Inferential Statistics in R

  6. Agenda Introduction to Statistics Terminology Categories in Statistics Descriptive & Inferential Statistics Statistics in R Descriptive Statistics in R Inferential Statistics in R

  7. Agenda Introduction to Statistics Terminology Categories in Statistics Descriptive & Inferential Statistics Statistics in R Descriptive Statistics in R Inferential Statistics in R

  8. Agenda Introduction to Statistics Terminology Categories in Statistics Descriptive & Inferential Statistics Statistics in R Descriptive Statistics in R Inferential Statistics in R

  9. Introduction to Statistics

  10. Introduction to Statistics Statistics is a branch of mathematics dealing with data collection and organization, analysis, interpretation and presentation. Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  11. Introduction to Statistics Statistics is a branch of mathematics dealing with data collection and organization, analysis, interpretation and presentation. Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  12. Introduction to Statistics Statistics is a branch of mathematics dealing with data collection and organization, analysis, interpretation and presentation. Analyse Data Build a Model Infer Result Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  13. Introduction to Statistics Statistics is a branch of mathematics dealing with data collection and organization, analysis, interpretation and presentation. Stock Market Life Education Sciences Statistics Weather Insurance Retail Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  14. Terminology

  15. Basic Terminology There are a few statistical terms one should be aware of while dealing with statistics. Population Sample Variable Parameter Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  16. Basic Terminology There are a few statistical terms one should be aware of while dealing with statistics. Population Sample Variable Parameter Population is the set of sources from which data has to be collected. Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  17. Basic Terminology There are a few statistical terms one should be aware of while dealing with statistics. Population Sample Variable Parameter A Sample is a subset of the Population. Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  18. Basic Terminology There are a few statistical terms one should be aware of while dealing with statistics. Population Sample Variable Parameter Gender Age Region A variable is any characteristics, number, or quantity that can be measured or counted. A variable may also be called a data item. Language Height Time Weight Income Degree Blood Group Ethnicity Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  19. Basic Terminology There are a few statistical terms one should be aware of while dealing with statistics. Population Sample Variable Parameter µ Also known as a statistical model, A statistical Parameter or population parameter is a quantity that indexes a family of probability distributions. ∑ х Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  20. Types of Analysis An analysis can be done in one of two ways. Analysis Quantitative Qualitative Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  21. Types of Analysis An analysis can be done in one of two ways. Analysis Quantitative Qualitative Also known as Statistical Analysis, it is the science of collecting & interpreting objects with numbers. Also known as Non-statistical Analysis, it mostly deals with generic data using text, media, etc Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  22. Categories in Statistics

  23. Categories in Statistics There are two major categories in Statistics. Descriptive statistics uses the data to provide descriptions of the population, either through numerical calculations or graphs or tables. Descriptive Inferential statistics makes inferences and predictions about a population based on a sample of data taken from the population in question. Inferential Inferential Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  24. Descriptive Statistics This method, is mainly focused upon the main characteristics of data. It provides graphical summary of the data. Characteristics of Data Descriptive Statistics Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  25. Descriptive Statistics This method, is mainly focused upon the main characteristics of data. It provides graphical summary of the data. Maximum Average Minimum Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  26. Inferential Statistics This method, generalizes a large dataset and applies probability to draw a conclusion. It allows us to infer data parameters based on a statistical model using a sample data. Statistical Model Start Inferential Statistics Process Step Choice II Decision Choice I Answer Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  27. Inferential Statistics This method, generalizes a large dataset and applies probability to draw a conclusion. It allows us to infer data parameters based on a statistical model using a sample data. Tall Average Short Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  28. Descriptive Statistics – Statistical Measures

  29. Descriptive Statistics – Use Case Cars mpg cyl disp hp drat Here is a sample dataset of cars containing A 21 6 160 110 3.9 the variables: Cars, Mileage per B 21 6 160 110 3.9 Gallon(mpg), Cylinder Type (cyl), C 22.8 4 108 93 3.85 Displacement (disp), Horse Power(hp) & D 21.3 6 108 96 3 Real Axle Ratio(drat). E 23 4 150 90 4 Using descriptive Analysis, you can analyse F 23 6 108 110 3.9 each of the variables in the dataset for G 23 4 160 110 3.9 mean, standard deviation, minimum and H 23 6 160 110 3.9 maximum. Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  30. Measures of the Centre There are a few statistical terms one should be aware of while dealing with statistics. Mean Median Mode Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  31. Descriptive Statistics – Use Case Cars mpg cyl disp hp drat If we want to find out the average A 21 6 160 110 3.9 horsepower of the cars among the B 21 6 160 110 3.9 population of cars, we will check and C 22.8 4 108 93 3.85 calculate the average of all values. In this D 21.3 6 108 96 3 case, E 23 4 150 90 4 110 + 110 + 93 + 96 + 90 + 110 + 110 + 110 8 F 23 6 108 110 3.9 = 103.625 G 23 4 160 110 3.9 H 23 6 160 110 3.9 Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  32. Measures of the Centre There are a few statistical terms one should be aware of while dealing with statistics. Mean Median Mode Measure of average of all the values in a sample is called Mean. Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  33. Descriptive Statistics – Use Case Cars mpg cyl disp hp drat If we want to find out the centre value of A 21 6 160 110 3.9 mpg among the population of cars, we will B 21 6 160 110 3.9 arrange the mpg values in ascending order C 22.8 4 108 93 3.85 to choose the middle value. In this case, D 21.3 6 108 96 3 21,21,21.3,22.8,23,23,23,23 E 23 4 150 90 4 But in case of even entries, we take F 23 6 108 110 3.9 average of the two middle values. In this G 23 4 160 110 3.9 22.8+23 2 case, = 22.9 H 23 6 160 110 3.9 Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  34. Measures of the Centre There are a few statistical terms one should be aware of while dealing with statistics. Mean Median Mode Measure of the central value of the sample set is called Median. Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  35. Descriptive Statistics – Use Case Cars mpg cyl disp hp drat If we want to find out the most common A 21 6 160 110 3.9 type of cylinder among the population of B 21 6 160 110 3.9 cars, we will check the value which is C 22.8 4 108 93 3.85 repeated most number of times. D 21.3 6 108 96 3 E 23 4 150 90 4 F 23 6 108 110 3.9 G 23 4 160 110 3.9 4 6 4 6 H 23 6 160 110 3.9 Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  36. Measures of the Centre There are a few statistical terms one should be aware of while dealing with statistics. Mean Median Mode The value most recurrent in the sample set is known as Mode. Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  37. Measures of the Spread There are a few statistical terms one should be aware of while dealing with statistics. Range Inter Quartile Range Variance Standard Deviation Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  38. Measures of Spread There are a few statistical terms one should be aware of while dealing with statistics. Range Inter Quartile Range Variance Standard Deviation Range is the given measure of how spread apart the values in a dataset are. Range = Max(??) - Min(??) Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  39. Measures of Spread There are a few statistical terms one should be aware of while dealing with statistics. Range Inter Quartile Range Variance Standard Deviation Inter Quartile Range(IQR) is the measure of variability, based on dividing a dataset into quartiles. 1 2 3 4 5 6 7 8 Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  40. Measures of Spread There are a few statistical terms one should be aware of while dealing with statistics. Range Inter Quartile Range Variance Standard Deviation Quartile Q3 Q2 Q1 1 2 3 4 5 6 7 8 Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  41. Measures of Spread There are a few statistical terms one should be aware of while dealing with statistics. Range Inter Quartile Range Variance Standard Deviation Quartile Q3 Q2 Q1 2+3 2 Q1= =2.5 1 2 3 4 5 6 7 8 Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  42. Measures of Spread There are a few statistical terms one should be aware of while dealing with statistics. Range Inter Quartile Range Variance Standard Deviation Quartile Q3 Q2 Q1 4+5 2 Q2= =4.5 1 2 3 4 5 6 7 8 Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  43. Measures of Spread There are a few statistical terms one should be aware of while dealing with statistics. Range Inter Quartile Range Variance Standard Deviation Quartile Q3 Q2 Q1 6+7 2 Q3= =6.5 1 2 3 4 5 6 7 8 Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  44. Measures of Spread There are a few statistical terms one should be aware of while dealing with statistics. Range Inter Quartile Range Variance Standard Deviation Inter Quartile Range Q3 Q1 1 2 3 4 5 6 7 8 Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  45. Measures of Spread There are a few statistical terms one should be aware of while dealing with statistics. Range Inter Quartile Range Variance Standard Deviation Variance describes how much a random variable differs from its expected value. It entails computing squares of deviations. Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  46. Measures of Spread There are a few statistical terms one should be aware of while dealing with statistics. Range Inter Quartile Range Variance Standard Deviation ❖ Deviation is the difference between each element from the mean. Deviation = (??-µ) Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  47. Measures of Spread There are a few statistical terms one should be aware of while dealing with statistics. Range Inter Quartile Range Variance Standard Deviation ❖ Population Variance is the average of squared deviations. ? 1 ? =(??−?)² σ² = ෍ ?=1 Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  48. Measures of Spread There are a few statistical terms one should be aware of while dealing with statistics. Range Inter Quartile Range Variance Standard Deviation ❖ Sample Variance is the average of squared differences from the mean. ? 1 =(??− ҧ ?)² s² = ෍ (? − 1) ?=1 Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

  49. Measures of Spread There are a few statistical terms one should be aware of while dealing with statistics. Range Inter Quartile Range Variance Standard Deviation Standard Deviation is the measure of the dispersion of a set of data from its mean. ? 1 ? =(??−?)² σ = ෍ ?=1 Copyright © 2018, edureka and/or its affiliates. All rights reserved. www.edureka.co/masters-program/business-intelligence-certification

More Related