1 / 44

DATA 220 Mathematical Methods for Data Analysis September 17 Class Meeting

DATA 220 Mathematical Methods for Data Analysis September 17 Class Meeting. Department of Applied Data Science San Jose State University Fall 2019 Instructor: Ron Mak www.cs.sjsu.edu/~mak. Some Counting Principles.

ashley
Télécharger la présentation

DATA 220 Mathematical Methods for Data Analysis September 17 Class Meeting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DATA 220Mathematical Methods for Data AnalysisSeptember 17 Class Meeting Department of Applied Data ScienceSan Jose State UniversityFall 2019Instructor: Ron Mak www.cs.sjsu.edu/~mak

  2. Some Counting Principles • To perform statistics, it is important to know how to count the size of a population or the size of a sample drawn from the population. • Example: Your video streaming service has these types of movies: You want to watch three movies tonight, one of each type. There are different combinations of movies you can watch. The number of ways of making a sequence of independent choices is the product of the number of choices at each step.

  3. Some Counting Principles, cont’d • Example: There are 26 letters in the alphabet. How many 1-letter sequences can you make? 2-letter sequences? etc. The number of sequences of k objects chosen from a collection of n objects is nk.

  4. Some Counting Principles, cont’d • Example: This time, there can be no repeated letters within a sequence. Within a sequence, each position has one fewer number of choices than the preceding position. The number of sequences of k objects chosen without repetition from a collection of n objects is

  5. Some Counting Principles, cont’d • Example: How many 3-digit numbers are there using the digits 1 through 9 that have no repeated digits? • Example: How many of those 504 numbers are odd, i.e., how many have 1, 3, 5, 7, or 9 as their third digit?How many choices are there for the third digit? It depends on what we chose for the first two digits!

  6. Some Counting Principles, cont’d • Solution #1: There are 4 even digits: 2, 4, 6, and 8.There are 5 odd digits: 1, 3, 5, 7, and 9.

  7. Some Counting Principles, cont’d • Solution #2: Instead of determining the numbers of digit choices from left to right, start with the third digit and work from right to left:

  8. Factorial Notation • The product is n factorial, written n! • n! grows rapidly:

  9. Factorial Notation, cont’d • Factorial notation can simplify formulas. • Example: Out of 15 children, you must choose a team of 9 children. Therefore, the number of teams you can make is and 6 children are left behind.We could have written this product all the way down to 1, i.e., as 15!, but then we must remove the final six products, by cancellation: Math shorthand only! We wouldn’t want to actually calculate this way.Just multiply the numbers from 15 down to 7 as before.

  10. Factorial Notation, cont’d • In the children’s teams example, n = 15 children total and k = 9 children chosen for the team. The number of sequences of k objects chosen without repetition from a collection of n objects is The number of sequences of k objects chosen without repetition from a collection of n objects is

  11. Count the Complement • More details about the streaming videos: • Total possible combinations of three movies, one from each type, is • Tonight, you again want to watch three movies, one of each type, but you do not want all three movies to have car chases. • With this restriction, how many movie combinations do you have?

  12. Count the Complement, cont’d • Solution: • The total choices without restrictions is • The number of disallowed triple features is • Therefore, the number of allowable choices must be Choose three movies,one from each type. But not all three can have car chases.

  13. Count the Complement, cont’d • Example: You want to line up 15 children. However, Mary and John don’t get along, so you must keep them separate. How many allowable line-ups? • Solution: The total number of possible line-ups: 15!How many ways can we put Mary and John together in line? There are 14 adjacent pairs of positions in a line-up, and either Mary or John can be in the first position of the pair: bad pairs. For each bad pair, there are 13! ways to line up the remaining children, for a total of unallowable line-ups.Therefore, the number of allowable line-ups is

  14. Uncertainty • Data scientists make conclusions (inferences) about an entire collection of data (the population) based on evidence from samples drawn from the population. • Use the methods of analytical statistics. • Because a sample is not necessarily identical to the population from which it was drawn, there is some uncertainty about the inferences. • For example, how accurately does the sample mean and sample standard deviation represent the corresponding population measures?

  15. Probability • Probability theory creates mathematical models to study chance or randomness. • Probability is the tool that enables us to make inferences. • Our newly enhanced ability to count will be extremely useful.

  16. Classical Interpretation of Probability • Originally arose from studying games of chance. • The probability that a flipped fair coin will land heads is 1/2. • The probability that a card drawn from a shuffled deck of 52 cards is an ace is 4/52.

  17. Classical Interpretation, cont’d • Each distinct result is an outcome. • An event is a collection of outcomes. • The probability of an event E is the ratio of the number of outcomes Nefavorable to event Eto the total number of possible outcomes N. • Example: Let Ne = 4 ace cards and N = 52 total cards. Then P(drawing an ace card) = 4/52.

  18. Relative Frequency Interpretation of Probability • An empirical approach that uses experiments to count the occurrences of event E. • Repeat an experiment a number of times. If event E occurs 30% of the time, then 0.3 can be a good approximation to the probability of event E. • If n is the number of trials of the experiment and event E occurs on ne of those trials, then • The approximation improves with larger values of n.

  19. Some Basic Probability Laws • For any event A: • The probability of event A occurring ranges from never (probability 0) to always (probability 1). • The complement of an event Ais the event that A does not occur: • If A and B are mutually exclusive events (both cannot occur simultaneously), then:

  20. Some Basic Probability Laws, cont’d • Example: In an arbitrary line-up of 15 children, what is the probability of a bad line-up, where Mary and John are together? • Example: What is the probability of a good line-up? • Example: What is the probability that a die roll will result in either an even number or the number 7? =

  21. Break • 15 minutes

  22. Random Variables • random variable: A numerical variable whose value depends on the outcome of a chance experiment. • Computers use algorithms to generate pseudo-random values. • True random values are often derived from measurements of natural phenomena.

  23. Discrete Random Variables • A discrete random variable has a set of values that is a collection of isolated points on the number line. • A common set values for a discrete random variable is a subset of integers. • In a dataset, we can treat a sequence of numerical values as discrete random values if the values don’t depend on each other. • Example: The ages of the passengers in the Titanic Survival dataset.

  24. Continuous Random Variables • A continuous random variable has a set of possible values in an entire interval of the number line. • The values are floating-point (real) numbers. • Example: You break a meter stick in two. The distance x from one end of the stick to where the break occurs is a continuous random value. x = 0.2 m is possible. So is x = 0.72 m. In fact, any value of x from 0 to 1 meter is possible. Therefore, x is a continuous random variable. What if you rounded x to the nearest millimeter?

  25. Probability Distributions of Random Variables • Let x be a discrete random variable associated with an experiment whose results are determined by chance. • The outcome that occurs when the experiment is performed determines which value of x is observed. • The total probability for all outcomes is 1. • The probability distribution of x describes how much of this total probability is placed on each possible x value.

  26. Graphs of Probability Distributions • We can graph a probability distribution. • The x axis is the possible values of variable x. • The y axis is the probability of each x value. • There are several common probability distributions, each with a different shape when the distributions are graphed. • The more random values from a particular distribution are graphed, the closer the graphgets to the theoretical shape of the distribution.

  27. Graphs of Probability Distributions, cont’d • By graphing the random values of a variable, we can often determine its probability distribution. • A typical task of data analysis is to examine a sequence of values in a dataset and determine its probability distribution. • Knowing the probability distribution enables us to apply more advanced analytical procedures that are appropriate to that type of distribution.

  28. Graphs of Probability Distributions, cont’d • Example: We drew a histogram of the ages of the Titanic passengers:From the rough shape of the graph, we may inferthat the ages have a normal distribution.

  29. Uniform Probability Distribution • Within the interval of possible values for this continuous random variable x, every interval of equal width has an equal probability of being observed. https://courses.lumenlearning.com/odessa-introstats1-1/chapter/the-uniform-distribution/ http://www.a-levelmathstutor.com/stat-discrete-rand-vars4.php

  30. Normal Probability Distribution • The distribution of this continuous random variable x forms a bell curve. https://statisticsbyjim.com/basics/normal-distribution/

  31. Exponential Distribution • The values of this continuous random variable are often concerned with the amount of time until some specific event occurs. • There are fewer large values and more small values. • Examples: • The amount of time until a customer finishes browsing and actually purchases something in your store. • The length of time of a phone call. • The amount of time you need to wait until the bus arrives.

  32. Exponential Distribution, cont’d • Example: The length of time, in minutes, that a postal clerk spends with a customer: https://courses.lumenlearning.com/introstats1/chapter/the-exponential-distribution/

  33. Exponential Distribution, cont’d • Example: The probability that a postal clerk will spend 4 to 5 minutes with a randomly selected customer: https://courses.lumenlearning.com/introstats1/chapter/the-exponential-distribution/

  34. Binomial Distribution • Properties of a binomial experiment: • It consists of a fixed number n of trials. • Each trial can result in only two mutually exclusive outcomes, label success (S) and failure (F). • The outcomes of different trials are independent. • The probability p that a trial results in S is the same for each trial.

  35. Binomial Distribution, cont’d • This discretebinomial variablex is the number of successes observed when the experiment is performed. • Variable x has a binomial distribution. • It’s related to anormal distributionfor a continuousrandom variable. You will perform 20 trials of abinomial experiment, where theprobability of success of eachtrial is p = 0.25. What is theprobability that you’ll havex successes from the 20 trials? http://www.real-statistics.com/binomial-and-related-distributions/binomial-distribution/

  36. Poisson Distribution • The values of this discrete random variable x with a Poisson distribution is the number of events over an interval of time or space. • Example: The number of cars arriving at a toll booth during a given 5-minute period of time. The event is the arrival of the car. The time is the 5 minutes. • Example: The number of plastic particles in a liter of water sampled from the ocean. The event is the discovery of a plastic particle. The space is the liter of sampled water. The amount of time between Poisson events has an exponential distribution.

  37. Poisson Distribution, cont’d • The graph increasingly resembles a normal curve as the number of values of x increases. https://www.sciencedirect.com/topics/social-sciences/poisson-distribution

  38. Animated Graphs • Up until now, the graphs we’ve drawn with the Python packages were static (nothing moves). • Often it can be insightful to draw dynamic (animated) graphs. • Update the graph to include new data values. • The matplotlib.animation.FuncAnimationfunction repeatedly calls a function you’ve written. • It pauses a given duration in milliseconds between calls to your function.

  39. Demo: Animated Graph of Die Throws • If you roll a die, which face comes up is a random variable. • It’s a uniformly distributed random variable that can have face values 1, 2, 3, 4, 5, or 6. • Each value has the probability 1/6. • The more times you roll the die, the closer the frequencies of the face values become equal. • Animate the frequency (probability) distribution bar chart as the number of rolls increases.

  40. Demo: Animated Graph of Die Throws • Function show_bar_chart(written for the Titanic dataset analysis) draws a bar chart of the frequency of each of the six face values. • Function update_framegenerates new random face values each time it’s called. Then it calls function show_bar_chart. • matplotlib.animation.FuncAnimationcalls function update_framefor each animation frame, which occurs every 25 milliseconds (or any duration you specify).

  41. Demo: Animated Graph of Die Throws, cont’d • Run the program on the command line of a terminal window (not in a Jupyter notebook): • This program requires two arguments on the command line, the number of frames (e.g., 100) and the number of die rolls per frame (e.g., 10). • The following Python statements obtain the values of the first and second command-line parameters. ipython -iRollDieDynamic.py 100 10 import sys number_of_frames = int(sys.argv[1])   rolls_per_frame  = int(sys.argv[2])  

  42. Lab Assignment #4: Animated Graphs • In four Python programs, draw animated bar charts of the following probability distributions: • normal • exponential • binomial • Poisson • Each frame should update the graph with more new random values. • You choose the parameters (range, mean, standard deviation, etc.) for each graph. You can draw frequency distributiongraphs instead. Use the dynamic dieprogram as a model

  43. Lab Assignment #4, cont’d • Each animated graph should display: • A running count of how many random values are graphed. • The parameters used (mean, standard deviation, etc.) to generate the random values. • Labeled x and y axes. • A label at the top of each bar that indicates the bar’s value. • Go online to find the which Python functions will generate random values for each distribution.

  44. Lab Assignment #4, cont’d • You can write Python programs instead of creating Jupyter notebooks. • Seaborn has difficulty displaying animation inside of notebooks. • Try the Eclipse Python plugin! • Submit to Canvas: Assignment #4 • Due Monday, September 23 at 11:59 PM

More Related