1 / 73

Probability&Statistics - based models

Probability&Statistics - based models. Raina Robeva – Sweet Briar College. August 1, 2007 MathFest 2007 San Jose, CA. Probability&Statistics - based models. Introduction. Quantitative Traits (Limit Theorems). Luria – Delbruck Experiments.

drake
Télécharger la présentation

Probability&Statistics - based models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probability&Statistics - based models Raina Robeva – Sweet Briar College August 1, 2007 MathFest 2007 San Jose, CA

  2. Probability&Statistics - based models • Introduction • Quantitative Traits (Limit Theorems) • Luria – Delbruck Experiments • Evaluating risks from time series data

  3. Elementary Probability Random Variables - Probability Space Histograms

  4. Elementary Probability Set of all outcomes - Examples: 1) Flipping a coin: 2) Rolling a die: 3) Rolling two dice:

  5. Elementary Probability Elementary Events – the elements of Events – the subsets of : Definition of Probability: How do we find probabilities? We Count!

  6. Chromosomes are large DNA molecules found in the cell’s nucleus Each gene has a specified place on the chromosome called a locus. The human Chromosome 11 contains 28 genes. The first 5 genes from the tip of the short arm form a cluster of genes that encode components of hemoglobin Chromosomes and Genes Genes are found on chromosomes and code for a specific trait The possible alternative forms of the genes are called alleles.

  7. - All possible sequences of length 2 comprised of a and A Problem One gene, two types of alleles: a (recessive) and A (dominant) k = number of dominant alleles (0, 1, or 2) IfE= “exactly kdominant alleles”, findP(E).

  8. x Parental Generation x only round peas in F1 First Filial Generation Second Filial Generation 3:1 ratio of round vs. wrinkled in F2 Problem (cont.) Gregor Mendel – experiments with peas Round - dominant Wrinkled - recessive Phenotypic Ratios 1:3 (1:2:1)

  9. x Parental Generation x All of intermediate color First Filial Generation Second Filial Generation Two new shades appear Quantitative Traits (1909) Herman Nilsson – Ehle Phenotypic Ratios 1 : 4 : 6 : 4 : 1 1 : 6 : 15 : 20 : 15 : 6 : 1 …

  10. Quantitative Traits – Examples

  11. Polygenic Hypothesis n genes, two types of alleles: aand A N = 2n – total positions k = number of dominant alleles (0, 1, 2, …, N) If E = “exactly k dominant alleles”, find P(E) = ?

  12. - All possible sequences of length 8 comprised of a and A In general, Polygenic Hypothesis – set of outcomes 1 3 4 2

  13. Polygenic Hypothesis N = 2n – total positions Alleles aand A are equally likely k = number of dominant alleles (0, 1, 2, …, N) IfE= “exactlykdominant alleles”, find P(E).

  14. P(X = k) = Example: Nilsson-Ehle (1909) Nilsson – Ehle: Two genes (n = 2),N = 2n = 4 number of alleles X – number of a alleles in the N loci

  15. Discrete – X takes integer values X is “known” when we know P(X=k) for all possible k Continuous – X can be any value from an interval • X is “known” when we know: • the distribution function F(x) = P(X< x); • the probability density function f(x) = d/dx [F(x)] Random Variables

  16. N= 20, p = 0.7 N= 20, p = 0.2 N = 20, p = 0.5 • Parameters • Bernoulli (p) • Bin(N, p) • Po( ) Common Discrete Random Variables • BernoulliX takes values k = 0, 1 • P(X=1) = p; P(X=0) = 1-p • Binomial X takes values k = 0, 1, 2, …, N • PoissonX takes values k = 0, 1, 2, 3, …

  17. ExponentialX takes values Common Continuous Random Variables • Gaussian (Normal)X takes values

  18. Bell - Shaped Distr. of Quantitative Traits • Traits are controlled not by one but by several different genes. The genes are independent and contribute cumulatively to the expression of the characteristic (Polygenic Hypothesis) • Distribution of the trait is Binomial (2n, p), where n –number of genes and p frequency of the non-contributing allele in the population. • Distribution is approximately Gaussian. • Further “smoothing” by environmental factors

  19. When Np is large and N(1-p) is large, then Binomial (N,p) ~ Normal (Np, ) 1667 - 1754 1749 - 1827 N = 20, p = 0.5 Moivre Laplace N=8, p = 0.2 N=50, p = 0.7 Why the “bell-shaped” distribution of quantitative traits? Central Limit Theorem

  20. Aggregate Characteristics • Mean Value • Standard Deviation • Moments of order m

  21. Poisson( ) • Gaussian ( ) Examples • Binomial (N, p)

  22. Average number of events per unit time = • X(t) hasa Poisson distribution with parameter Poission Distribution Arises When… • Events of low intensity occurring in time 0 t time • X(t) – the number of events that have occurred in [0,t]

  23. Average number of events per unit surface/volume per unit time = • X– the number of events that have occurred in a unit surface/volume over time t • Xhasa Poisson distribution with parameter Poission Distribution Arises When… • Events of low intensity occurring independently of one another

  24. The Law of Large Numbers (1713) If X is a random variable with then or, equivalently,

  25. Example – Ordinary Coin Toss Game 1. Toss a coin 2. If Heads, win $1 3. If Tails, win nothing 4. Let Xi be your win for game i 5. Average payback to you 6. By the Law of Large Numbers Simulation Example

  26. 6. Average payback to you Example – St. Petersburg Game 1. Toss a coin 2. If Heads, win $2 3. If Tails, keep tossing until it falls Heads 4. If first Heads on N-th toss, win $2N H $2 TH $4 TTH $8 TTTH $16 etc. 5. With probability 1/(2N) we win $2N

  27. St. Petersburg Game – a sample run

  28. Random Processes (Temporal Stochastic Models) Random Process: X(t) – Random variable that changes in time • When t = 0, 1, 2, … – Discrete Random Process • When t changes continuously – Continuous Random Process • In addition, since for any value of t, X(t) can be discrete or continuous random variable, there are four possibilities for the process {X(t), t}. • {X(t), t} is defined through its probability distribution. • For example, if X(t) can take values x = 0,1,2,…, then is the probability distribution of X.

  29. Deterministic Model • X(t) = population size at time t • I= rate of immigration • a= per capita death rate Single Population Immigration-Death Process • Stochastic Model (Kolmogorov – Chapman DE) can happen when: • X(t) = x and no change over . (Event A) • X(t) = x + 1 and one death over . (Event B) • X(t) = x -1 and one immigration over . (Event C) • Probability for more than unit change over . (D)

  30. P(B) P(C) P(A) P(D) Subtract , divide by , and let Kolmogorov – Chapman Equations Demo

  31. The mean value of the stochastic process X satisfies the deterministic equation How are the Stochastic and Deterministic Models Related? • Define • Multiply the K-C equation by n and sum over n

  32. Luria-Delbruck Experiments

  33. When do mutations occur? Lamarckian Model - mutations evolve only in response to an environmental cue. Darwinian Model - mutations are equally likely to occur at any moment in time.

  34. Large number of bacterial cultures, starting each one from a small number of cells. Control Plate the cultures on nutrient agar plates that on which a large amount of a virus has been plated first. Incubate. Luria-Delbruck Experiments (1943) Luria SE & Delbruck M. Mutations of Bacteria from Virus Sensitivity to Virus Resistance. Genetics28:491(1943).

  35. Hypotheses Hypothesis 1 (Mutation): Mutations occur randomly, but the probability that a bacterium mutates from sensitive to resistant is small. This mutation is completely independent from the presence of the virus. When the bacteria are added to the plates, the mutants are already resistant to the virus. Only these mutants proliferate into colonies on the plate. Hypothesis 1 (Acquired Immunity): A small number of bacteria mutated to acquire resistance only after they are exposed to the virus. Survival confers immunity not only to the individual but also to its offspring, and the colonies grow.

  36. Count the Number of Colonies

  37. killer virus Two opposing hypotheses Hypothesis 1 (Acquired Immunity, Directed Mutation): A small number of bacteria mutated to acquire resistance only after they are exposed to the virus. Survival confers immunity not only to the individual but also to its offspring, and the colonies grow.

  38. killer virus Two opposing hypotheses Hypothesis 2 (Mutation + Selection):Mutations occur randomly, but the probability that a bacterium mutates from sensitive to resistant is small. This mutation is completely independent from the presence of the virus. When the bacteria are added to the plates, the mutants are already resistant to the virus. Only these mutants proliferate into colonies on the plate.

  39. Under the Directed Mutation Hypothesis Poisson • Under the Mutation + Selection Hypothesis Non-Poisson killer virus killer virus What is the Distribution of the Mutant Cells at the time of plating? Luria-Delbruck Distribution

  40. Large variation in the number of mutants

  41. What is the average number of resistant cells under continuous mutation? • Assume that mutation can only occur at the time of division • Assume that each cell can mutate with a constant probability p Generation (i) Average number of mu-tant cells in generation i Expected number of mutants at the end from this generation

  42. Biological ESTEEM Mutation.xls AcqIm.xls

  43. Lea and Coulson (1949) Theorem. Let Xt denote the number of mutant cells in the culture at time t. If p is the probability for a single cell to mutate and m = p2n, then the probability generating function of the distribution defined by has the form Lea, D.E. and Coulson, C.A. (1949) The distribution of the number of mutants in bacterial populations. J. Genetics49, 264-285

  44. More recent work on the Luria-Delbruck distribution

  45. Evaluating risk from time series data • Glucose Variability and Risk Assessment in Diabetes • Hearth Rate Variability and the Risk for Neonatal Sepsis

  46. Blood Glucose Fluctuation Characteristics Quantified from Self-Monitoring Data In both human and economic terms, diabetes is one of the nations most costly diseases. Diabetes is the leading cause of kidney failure, blindness in adults, and amputations. It is a major risk factor for heart disease, stroke, and birth defects. Diabetes shortens average life expectancy by up to 15 years, and costs our nation in excess of$100 billion annually in health-related Sixteen Million people in the United States have Diabetes Mellitus. expenditures- more than any other single chronic disease. Diabetes spares no group, affecting young and old, all races and ethnic groups, the rich and the poor.

  47. Definitions • Type 1 Diabetes also referred to as Insulin Dependent Diabetes Mellitus (IDDM) is the type of diabetes in which the pancreas produces no insulin or extremely small amounts; • Type 2 Diabetes is the type of diabetes in which the body doesn’t use its insulin effectively or doesn’t produce enough insulin • Insulin a hormone secreted by the pancreas that regulates metabolism of glucose. • Blood Glucose (BG) is the concentration of glucose in the bloodstream; • The BG levels are measured in mg/dl (USA) and in mmol/L (most elsewhere); • The two scales are directly related by: 18 mg/dl= 1mM;

  48. Food Counter- regulation Insulin Insulin Insulin Hyperglycemia Target Blood Glucose Range: 70-180 mg/dl (DCCT, 1993) Hypoglycemia Severe Hypoglycemia

  49. Severe Hypoglycemia • Defined as a low BG resulting in stupor, seizure, or unconsciousness that precludes self-treatment (The Diabetes Control and Complications Trial Research Group, 1997). Four percent of the deaths among individuals with IDDM are attributed to SH (DCCT Study Group, 1991). • Although most severe hypoglycemic episodes are not fatal, there remain numerous negative sequelae leading to compromised occupational and scholastic functioning, social embarrassment, poor judgment, serious accidents, and possible permanent cognitive dysfunction (Gold AE et al., 1993; Deary et al., 1993; Lincoln et al., 1996). • Fear of severe hypoglycemia is identified as the major barrier to improved metabolic control (Cryer et al., 1994).

  50. BG Fluctuations: T1DM

More Related