1 / 21

The Basics of Statistics for Data Science By Statisticians

Want to learn data science, but don't know how to start learn data science from scratch? Here in this presentation you will going to learn the basics of statistics for data science. Start learn these basic statistics to get the good command over data science.

Télécharger la présentation

The Basics of Statistics for Data Science By Statisticians

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. statanalytica.com The Basics of Statistics for Data Science By Statisticians STAT ANALYTICA

  2. INSIDE THE GUIDE statanalytica.com TOPICS AND HIGHLIGHTS Overview Introduction to StatisticsT erminologies in Statistics Types of Analysis Data Types Measures of Central Tendency Measures of Variability Measurements of Relationships between Variables Probability Distribution Functions Continuous Data Distributions Discrete Data Distributions Moments Probability Accuracy Conclusion

  3. statanalytica.com Data science has become a boom in the current industry. It is one of the most popular technologies these days. Most of the statistics students want to learn data science. Because statistics is the building block of the machine learning algorithms. But most of the students don’t know how much statistics they need to know to start data science. To overcome this problem we are going to share with you the best ever tips on statistics for data science. In this blog, you are going to see which statistics are crucial to start with data science. OVERVIEW

  4. Statanalytica Statistics is one of the most crucial subjects for the students. It has various methods that are helpful to solve the most complex problems of real life. Statistics is almost everywhere. Data science and data analysts use it to have a look on the meaningful trends in the world. Besides, statistics has the power to drive meaningful insight from the data.Statistics offers a variety of functions, principles, and algorithms. That is helpful to analyze raw data, build a Statistical Model and infer or predict the result. INTRODUCTION TO STATISTICS

  5. QUANTITATIVE ANALYSIS TYPES OF ANALYSIS Quantitative Analysis is also known as statistical analysis. It is the science or an art of collecting and interpreting data with numbers and graphs. We also use it to identify patterns and trends. STATISTICS HAS TWO TYPES OF ANALYSIS QUALITATIVE ANALYSIS Qualitative is also known as Non-Statistical Analysis. It gives generic information. It also uses text, sound and other forms of media.

  6. NUMERICAL DATA TYPES Numerical data types are those data types which are expressed with digits. These data types are measurable. There are two major types of data types i.e. discrete and continuous. STATISTICS HAS TWO TYPES OF DATA TYPES CATEGORICAL Categorical data types are qualitative data and it is classified into categories. There are two types of major categorical data types i.e. nominal (no order) or ordinal (ordered data).

  7. MEDIAN MEAN MODE Statanalytica MEASURES OF CENTRAL TENDENCY Median is the middle of the given ordered dataset. Means stands for the average of the given dataset. Mode is the most common value in a given dataset. It is the only relevant for discrete data.

  8. RANGE statanalytica.com Range is the difference between the maximum and minimum value in a given dataset. MEASURES OF VARIABILITY VARIANCE (Σ2) Variance measures how spread out a set of the given data is relative to the mean. STANDARD DEVIATION (Σ) It is also a measurement of how spread out numbers are in the given data set.  Square root of variance is also known as standard deviation.

  9. Z-SCORE statanalytica.com Z score determines the number of standard deviations a data point is from the mean. MEASURES OF VARIABILITY R-SQUARED R square is a statistical measure of fit. It used to indicate how much variation of a dependent variable is explained by the independent variable(s). We can use it only for the  simple linear regression. ADJUSTED R-SQUARED It is similar to the R squared and also R square modified version. It  has been adjusted for the number of predictors in the model. It decreases if the old term improves the model more than would be expected by chance and vice versa.

  10. COVARIANCE If we want to find the difference between two variables then we use the covariance. It is based on the philosophy that if  it is  positive then they tend to move in the same direction. Or  if it’s negative then they tend to move in opposite directions. There will also be no relation with each other,  if they are zero. MEASUREMENTS OF RELATIONSHIPS BETWEEN VARIABLES CORRELATION Correlation is all about to measure the strength of a relationship between two different variables. It ranges from -1 to 1. It is the normalized version of co-variance. Most of the time the correlation of +/- 0.7 represents a strong relationship between two different variables. On the other hand, there is  no relationship between variables when the correlations between -0.3 and 0.3

  11. PROBABILITY MASS FUNCTION (PMF) PROBABILITY DENSITY FUNCTION (PDF) CUMULATIVE DENSITY FUNCTION (CDF) statanalytica.com PROBABILITY DISTRIBUTION FUNCTIONS In the probability mass function for a discrete data. It also gives the probability of a given occurring value. It is for continuous data. Hereby in the continuous data the value at any point can be interpreted as providing a relative likelihood. In addition, the value of the random variable will also be equal to that sample. The cumulative density function is used to tell us the probability that the random variable is less than a certain value. In addition is also  the integral of the PDF.

  12. CONTINUOUS DISTRIBUTION statanalytica.com Continuous data distributions is a probability distribution. In this distribution all the outcomes are equally likely. CONTINUOUS DATA DISTRIBUTIONS NORMAL/GAUSSIAN DISTRIBUTION The normal distribution is commonly referred to as the bell curve. In addition it is also related to the central limit theorem. It has the standard deviation of 1 and the mean is 0. T-DISTRIBUTION The T distribution is another  probability distribution. It is used to estimate population parameters when the sample size is small.

  13. UNIFORM DISTRIBUTION statanalytica.com In this probability distribution we have the single value that only occurs within the certain range. The value outside this range is just 0. It is also known as on and off distribution. CONTINUOUS DATA DISTRIBUTIONS POSITION DISTRIBUTION It is quite similar to the normal distribution. But it offers the addition factor i.e. the skewness. The lower the value of the skewness the distribution will relatively uniformly spread in all directions. But if the skewness is high then the data will spread out in different directions with unequal distribution,

  14. POISSON DISTRIBUTION One of the most common probability distributions. It expresses the probability of a given number of events occurring within a given fixed time period. DISCRETE DATA DISTRIBUTIONS BINOMIAL DISTRIBUTION The probability distribution of the number of successes in a sequence of n independent experiences each with its own Boolean-valued outcome (p, 1-p).

  15. MOMENTS statanalytica.com The Moments describe different aspects of nature and the shape of any given distribution. Moments happened in sequence therefore the means is the first moment, the variance is the second one, skewness is the third one and the kurtosis is the fourth one and the last one.

  16. CONDITIONAL PROBABILITY BAYES’ THEOREM Probability The Bayes’ theorem is the most popular mathematical formula. It is used to determine the conditional probability. It is based on the methodology that the probability of A given B is equal to the probability of B given A times the probability of A over the probability of B”. In this probability [P(A|B)] is the likelihood of an event occurring.The event occurring is based on the occurrence of an event that occurred previously statanalytica.com

  17. TRUE POSITIVE FALSE-NEGATIVE statanalytica.com It detects the condition, if  the condition is present. It does not detect the condition if  the condition is present. ACCURACY TRUE NEGATIVE SENSITIVITY It does not detect the condition, if the condition is not present. It measures the ability of a test to detect the condition. If the condition is present. The sensitivity = TP/(TP+FN) FALSE-POSITIVE It automatically detects the condition if  the condition is absent.

  18. SPECIFICITY statanalytica.com It measures the ability of a test to correctly exclude the condition if the condition is absent. It specificity = TN/(TN+FP) ACCURACY PREDICTIVE VALUE POSITIVE Predictive value positive is also called as precision. In this the proportion of positives that correspond to the presence of the condition. Here is the formula  PVP = TP/(TP+FP) PREDICTIVE VALUE NEGATIVE In this  the proportion of negatives. It also corresponds to the absence of the condition. Here is the formula PVN = TN/(TN+FN)

  19. CONCLUSION statanalytica.com Now we have gone through all the basic concepts of statistics for data science. If you are going to start with data science then you should try to have a good command over all these statistics concepts. It will help you a lot when you start learning data science. With the help of these concepts you will be able to understand the data science concepts. So what are you waiting for? Grab the best statistics books and start learning these concepts.

  20. FACEBOOK TWITTER PINTEREST statanalytica.com @statanalytica @statanalytica @statanalytica FOLLOW US ON SOCIAL MEDIA

  21. WEBSITE statanalytica.com https://statanalytica.com CONTACT US EMAIL Info@statanalytica.com

More Related