300 likes | 418 Vues
Learning Mixtures of Structured Distributions over Discrete Domains . Xiaorui Sun Columbia University. Joint work with Siu -On Chan(UC Berkeley), Ilias Diakonikolas (U Edinburgh ), Rocco Servedio ( Columbia University ). Density Estimation. PAC-type learning model
E N D
Learning Mixtures of Structured Distributions over Discrete Domains Xiaorui Sun Columbia University Joint work with Siu-On Chan(UC Berkeley), IliasDiakonikolas(U Edinburgh), Rocco Servedio(Columbia University)
Density Estimation • PAC-type learning model • Set of possible target distributions over • Learner • Know the set but does not know the target distribution • Independently draws a few samples from • Outputs (succinct description of a) distribution which is -close to • Total variation distance is standard measure in statistics
Learn a structured distribution • If = {all distributions over }, samples are required • Much better sample complexities possible for structured distributions • Poisson binomial distributions [DDS12a] • samples • Monotone/k-modal [Bir87, DDS12b] • samples/samples
This work: Learn mixture of structured distributions • Learn mixture of distributions? • A set of distributions over • Target distribution is a mixture of distributions from • i.e. , such that • Our result: learn mixtures for several structured distributions • Sample complexity close to optimal • Efficient running time
Our results: learning mixture of log-concave • Log-concave distribution over [n] • for 1 n
Our results: log-concave • Algorithm to learn a mixture of log-concave distributions • Sample complexity: • Running time: bit operations • Lower bound: samples
Our results: mixture of unimodal • Unimodal distribution over [n] • s.t. 1 n
Our results: mixture of unimodal • A mixture of 2 unimodal distributions may have modes • Algorithm to learn a mixture of unimodal distributions • Sample complexity: samples • Running time: bit operations • Lower bound: samples
Our results: mixture of MHR • Monotone hazard rate distribution • Hazard rate of : • if • MHR distribution: is a non-decreasing function over 1 n
Our results: mixture of MHR • Algorithm to learn a mixture of MHR distributions • Sample complexity: • Running time: bit operations • Lower Bound: samples
Compare with parameter estimation • Parameter estimation [KMV10, MV 10] • Learn a mixture of Gaussians • Independently draw a few samples from • Estimate the parameters of each Gaussian component accurately • Number of samples inherently exponentially depends on , even for a mixture of 1-dimensional normal distributions [MV10]
Compare with parameter estimation • Parameter estimation needs at least exp() samples to learn a mixture of binomial distributions • Similar to the lower bound in [MV 10] • Density estimation allows to estimate non parametric distributions • E.g. log-concave, unimodal, MHR • Density estimation for mixture of binomial distributions over using samples • Binomial distribution is log-concave
Outline • Learning algorithm based on decomposition • Structural results for log-concave, unimodal, MHR distributions
Flat decomposition • Key definition: distribution is -flat if there exists a partition of into intervals such that • is an -flat decomposition for • is obtained by "flattening" within each interval • for
Learn -flat distributions • Main general Thm: Let = {all the -flat distributions}. There is an algorithm which draws samples from , and outputs a hypothesis such that . • Linear running time with respect to the number of samples
Easier problem: known decomposition • Given • Samples from an -flat distribution • -flat decompositionfor • Idea: estimate probability mass of every interval in • samples are enough
Real problem: unknown decomposition • Only given samples from a -flat distribution • Exists some-flat decomposition for , but unknown • A useful fact [DDS+ 13]: Ifis a -flat decomposition of , and is a “refinement” of , is a -flat decomposition of • If know a refinement of , it is good
Unknown flat decomposition (cont) • Idea: partition [n] into intervals each with small probability mass, • Achieve by sampling from 1 n
Unknown flat decomposition (cont) • Exist (unknown) • Refinement of both and • intervals 1 n
Unknown flat decomposition (cont) • Exist • Refinement of bothand • intervals • -flat decomposition for 1 n
Unknown flat decomposition (cont) • Compare and 1 1 n n
Unknown flat decomposition (cont) • If the total probability mass of every intervals of is at most , then • Partition [n] into intervals each with probability mass at most • samples are enough
Learn -flat distributions • Main general Thm: Let {all the -flat distributions}. There is an algorithm which draws samples from , and outputs a hypothesis such that
Learn mixture of distributions • Lem:Amixture of -flat distributions has an -flat decomposition • Tight for interesting distribution classes • Thm(Learn mixture): Let be a mixture of -flat distributions. There is an algorithm which draws samples, and outputs a hypothesis s.t.
First application: learning mixture of log-concave distributions • Recall definition: • for • Lem: Every log-concave distribution is-flat • Learn a mixture of log-concave distributions with samples
Second application: learning mixture of unimodal distribution • Lem: Every unimodal distribution is -flat [Bir87, DDS+13] • Learn a mixture of unimodal distribution with samples
Third application: learning mixture of MHR distribution • Monotone hazard rate distribution • Hazard rate of : • if • is a non-decreasing function over • Lem: Every MHR distribution is -flat • Learn a mixture of MHR distributions with samples
Conclusion and further directions • Flat decomposition is a useful way to study mixtures of structured distributions • Extend to higher dimension? • Efficient algorithm with optimal sample complexity