190 likes | 333 Vues
This study explores the use of Maximum Likelihood Estimation (MLE) to construct statistical profiles based on observations of maximum length motifs. Specifically, a |Σ|×n matrix A is created, representing estimated probabilities for the occurrence of each letter in specific positions. By analyzing 23 sample motifs from cyclic AMP receptor transcription factors, we demonstrate how MLE improves with increasing observations. The methodology aligns with statistical principles, illustrating MLE's application in real-world scenarios like coin toss experiments and probability estimation in DNA sequences.
E N D
Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas
Instance profiles • Given k observations of maximum length n, construct a |Σ|×n matrix A (profile) where entry Aij is the estimated probability that the ith letter occurs in position j • One way to estimate Aij is to count each letter occuring at this position (cij); then • This is maximum likelihood estimation (MLE) • Estimate becomes better as k increases
Example data • 23 sample motif instances for the cyclic AMP receptor transcription factor (positions 3-9) TTGTGGC TTTTGAT AAGTGTC ATTTGCA CTGTGAG ATGCAAA GTGTTAA ATTTGAA TTGTGAT ATTTATT ACGTGAT ATGTGAG TTGTGAG CTGTAAC CTGTGAA TTGTGAC GCCTGAC TTGTGAT TTGTGAT GTGTGAA CTGTGAC ATGAGAC TTGTGAG
Probability of a motif • Suppose that we consider M as a candidate motif consensus • How do we find the best M given the observations in A? • Assuming independence of positions,
Maximum likelihood estimation • General method for estimating unknown parameters when we have • a sample of values that depend on these parameters • a formula specifying the probability of obtaining these values given the parameters
MLE example: three coins • Suppose we have three coins with probability of heads ⅓, ½, and ⅔ • One of them is used to generate a series of 20 tosses and we observe 11 heads • θ = the heads probability of the coin used in the experiment • Binomial distribution for the number of heads
Binomial distribution • Count of one of two possible outcomes in a series of independent events • The probabilities of the two outcomes are constant across events • An example of iid events (independent, identically distributed)
Binomial probability mass • If the probability of one outcome (let’s call it A) is p and there are n events • The probability of the other outcome is 1-p • The probability of obtaining a particular sequence of outcomes with m A’s is • There are sequences with the same number m of outcomes A • Overall
MLE example: three coins • Result: Choose θ = ½
MLE example: unknown coins • θ can take any value between 0 and 1 • m heads in n tosses • Solve the differential equation
MLE for binomial • Of the three solutions, θ = 0 andθ = 1 result in P(X1,X2,...,Xn | θ) = 0, i.e., local minima • On the other hand, for 0<θ<1, P(X1,X2,...,Xn | θ) > 0, so θ = m/n must be a local maximum • Therefore the MLE estimate is
Properties of estimators • The estimation error for a given sample is where x is the unknown true value • An estimator is a random variable • because it depends on the sample • The meansquare error represents the overall quality of the estimation across all samples
Expected values • Recall that the expected value of a discrete random variable X is defined as • The expected value of a dependent random variable f(X) is • For continuous distributions, replace the sum with an integral
Bias in estimation • An estimator is unbiased if • MLE is not necessarily unbiased • Example: standard deviation • Is the most commonly used measure of dispersion in a data set • For a random variable X, it is defined as
Estimators of standard deviation • MLE estimator where • “Almost unbiased” estimator ( is an unbiased estimator of σ2) biased