1 / 46

Learning and testing k-modal distributions

Learning and testing k-modal distributions. Rocco A. Servedio Columbia University. Joint work (in progress) with. Costis Daskalakis MIT. Ilias Diakonikolas UC Berkeley. What this talk is about. Probability distributions over [N] = {1,2,…,N}. N. 1. 2.

hiero
Télécharger la présentation

Learning and testing k-modal distributions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Costis Daskalakis MIT Ilias Diakonikolas UC Berkeley

  2. What this talk is about Probability distributions over [N] = {1,2,…,N} N 1 2 Monotone increasing distribution: for all ` (Whole talk: “increasing” means “non-decreasing”)

  3. k-modal distributions k-modal: k peaks and valleys Monotone distribution: 0-modal A unimodal distribution: Another one: A 3-modal distribution:

  4. The learning problem Target distribution p is an unknown k-modal distribution over [N] Algorithm gets samples from p N 1 2 Goal: output a hypothesis h that’s -close to p in total variation distance Want algorithm that uses few samples & is computationally efficient.

  5. The testing problem q is a known k-modal distribution over [N]. N 1 2 p is an unknown k-modal distribution over [N]. Algorithm gets samples from p. N 1 2 Goal: output “yes” w.h.p. if “no” w.h.p. if

  6. Please note Testing problem is not: given samples from an unknown distribution p, determine if p is k-modal versus -far from every k-modal distribution. This problem requires samples, even for k=0. hard to distinguish vs 1 N 1 N uniform over random uniform over

  7. Why study these questions? • k-modal distributions seem natural • would be nice if k-modal structure were exploitable by efficient learning / testing algorithms • post hoc justification: solutions exhibit interesting connections between testing and learning

  8. The general case: learning If we drop k-modal assumption, learning problem becomes: Learn an arbitrary distribution over [N] to total variation distance N 1 samples are necessary and sufficient

  9. The general case: testing If we drop k-modal assumption, testing problem becomes: q is a known, arbitrary distribution over [N]. p is an unknown, arbitrary distribution over [N]. Algorithm gets samples from p. Goal: output “yes” if “no” if samples are necessary and sufficient [GR00, BFFKRW02, P08]

  10. This work: main learning result We give an algorithm that learns any k-modal distribution over [N] to accuracy . It uses samples and runs in time. Close to optimal: -sample lower bound for any algorithm.

  11. Main testing result We give an algorithm that solves the k-modal testing problem over [N] to accuracy . It uses samples and runs in time. Any testing algorithm must use samples. Testing is easier than learning!

  12. Prior work k=0,1: [BKR04] gave -sample efficient algorithm for testing problem (p,q both available via sample access) k=0,1: [Birge87, Birge87a] gave -sample efficient algorithm for learning, and matching lower bound We’ll use this algorithm as a black box in our results

  13. Outline of rest of talk • Background: some tools • Learning k-modal distributions • Testing k-modal distributions

  14. First tool: Learning monotone distributions Theorem [B87] There is an efficient algorithm that learns any monotone decreasing distribution over to accuracy . It uses samples and runs in time linear in its input size. [B87b] also gave lower bound for learning a monotone distribution.

  15. Second tool: Learning a CDF – the Dvoretsky-Kiefer-Wolfowitz inequality Theorem: [DKW56] Let be any distribution over with CDF . Let be empirical estimate of obtained from samples. Then with probability . true CDF empirical CDF Note: samples suffice (by easyChernoff bound argument) Morally, means you can partition into intervals each of mass under , using samples.

  16. Learning k-modaldistributions

  17. The problem Learn an unknown k-modal distribution over [N]. N 1 2

  18. What should we shoot for? Easy lower bound: need samples. (have to solve monotone-distribution-learning problems over to accuracy ) Want an algorithm that uses roughly this many samples and takes time

  19. The problem, again Goal: learn an unknown k-modal distribution over [N]. We know how to efficiently learn an unknown monotone distribution… X X X Would be easy if we knew the k peaks/valleys… Guessing them exactly: infeasible Guessing them approximately: not too great either

  20. A first approach Break up [N] into many intervals: … is not monotone for at most k of the intervals So running monotone distribution learner on each interval will usually give a good answer.

  21. First approach in more detail • Use [DKW] to divide [N] into intervals & obtain estimates such that • (Assumes each point has mass at most or so; heavier points are easy to detect and deal with.) • Run monotone distribution learner on each to get • (Actually run it twice: once for increasing, once for decreasing. • Do hypothesis testing to pick one as .) • Combine hypotheses in obvious way: and

  22. Sketch of analysis • Use [DKW] to divide [N] into intervals & obtain estimates such that • Takes samples • Run monotone distribution learner on each to get • Takes samples • Combine hypotheses in obvious way: • Total error from k non-monotone intervals • from scaling factors • from estimating ’s with ’s and

  23. Improving the approach came from running monotone distribution learner times rather than just times If we could somehow check – more cheaply than learning – whether an interval is monotone before running the learner, could run the learner fewer times and save… …this is a property testing problem! More sophisticated algorithm: two new ingredients.

  24. First ingredient: testing k-modal distributions for monotonicity Consider the following property testing problem: Algorithm gets samples from unknown k-modal distribution p over [N]. Goal: output “yes” w.h.p. if p is monotone increasing “no” w.h.p. if p is -far from monotone increasing Note: k-modal promise for p might save us from lower bound… hard to distinguish 1 n 1 n

  25. Efficiently testing k-modal distributions for monotonicity Algorithm gets samples from unknown k-modal distribution p over [N]. Goal: output “yes” w.h.p. if p is monotone increasing “no” w.h.p. if p is -far from monotone increasing Theorem: There is a -sample tester for this problem. close to v We’ll use this to identify sub-intervals of [N] where p is monotone …can we efficiently learn close-to-monotone distributions?

  26. Second ingredient: agnostically learning monotone distributions Consider the following “agnostic learning” problem: Algorithm gets samples from unknown distribution p over [N] that is -close to monotone. Goal: output hypothesis distribution h such that If opt=0, this is the original “learn a monotone distribution” problem Want to handle general case as efficiently as opt=0 case

  27. agnostically learning monotone distributions Algorithm gets samples from unknown distribution p over [N] that isopt-close to monotone. Goal: output hypothesis distribution h such that Theorem: There is a computationally efficient learning algorithm for this problem that uses samples.

  28. agnostically learning monotone distributions Semi- Algorithm gets samples from unknown distribution p over [N] that isopt-close to monotone. Goal: output hypothesis distribution h such that Theorem: There is a computationally efficient learning algorithm for this semi-agnostic problem that uses samples. The [Birge87] monotone distribution learner does the job. We will take , , so versus doesn’t matter.

  29. The learning algorithm: first phase • Use [DKW] to divide [N] into intervals & obtain estimates such that • Run testers on then etc., until first time both say “no” at Mark and continue. invocations of tester in total(Alternative: use binary search: invocations of tester in total.) …

  30. The algorithm • Run testers on then etc., until first time both say “no” at Mark and continue. … • Each time an interval is marked, • the block of unmarked intervals right before it is close-to-monotone; call this a superinterval • (at least) one of the k peaks/valleys of p is “used up”

  31. The learning algorithm: second phase • After this step, [N] is partitioned into • superintervals each -close to monotone • “marked” intervals, each of weight • Rest of algorithm: • Run semi-agnostic monotone distribution learner on each superinterval to get -accurate hypothesis for • Output final hypothesis

  32. Analysis of the algorithm • Sample complexity: • runs of tester: each uses samples • runs of semi-agnostic monotone learner: each uses • samples. • Error rate: • error from marked intervals • total error from estimating ’s with ’s • total error from scaling factors

  33. I owe you a tester Algorithm gets samples from unknown k-modal distribution p over [N]. Goal: output “yes” w.h.p. if p is monotone increasing “no” w.h.p. if p is -far from monotone increasing Theorem: There is a -sample tester for this problem.

  34. The testing algorithm • Algorithm: • Run [DKW] with accuracy Let be resulting empirical PDF. • If such that • then output “no”; otherwise output “yes” average value of over [a,b] • Completeness: p monotone increasing  test passes w.h.p.

  35. Soundness • Algorithm: • Run [DKW] with accuracy Let be resulting empirical PDF. • If such that • then output “no”; otherwise output “yes” • Soundness lemma: If is k-modal and have • then is -close to monotone increasing. To prove soundness lemma: show that under lemma’s hypothesis, can “correct” each peak/valley of by “spending” at most in variation distance.

  36. Correcting a peak of p • Lemma: If is k-modal and have • then is -close to monotone increasing. Consider a peak of p: Draw a line at height such that (mass of “hill” above line) = (missing mass of “valley” below line): Correct the peak by bulldozing the hill into the valley:

  37. Why it works • Lemma: If is k-modal and have • then is -close to monotone increasing. n correction So and so so

  38. Summary • Sample- and time- efficient algorithms for learning and testingk-modal distributions over [N]. • Upper bounds pretty close to lower bounds for these problems. • Testing is easier than learning • Learning algorithms have a testing component

  39. Future work • More efficient algorithms for restricted classes of -modal distributions? • [DDS11]: any sum of Bernoulli random variables is learnable using samples independent of special type of unimodaldistribution: “Poisson Binomial Distribution”

  40. Thank you

  41. Key ingredient: oblivious decomposition Decompose into intervals whose widths increase as powers of . Call these the oblivious buckets. … …

  42. Flattening a monotone distributionusing the oblivious decomposition Given a monotone decreasing distribution , the flattened version of , denoted , spreads ’s weight uniformly within each bucket of the oblivious decomposition: true pdf flattened version … … … … Lemma:[B87] For any monotone decreasing distribution , have

  43. Learning monotone distributionsusing oblivious decomposition [B87] Reduce learning monotone distributions over to accuracy learning arbitrary distributions over to accuracy Algorithm: • Draw samples from • Output hypothesis is the flattened empirical distribution • - • View as arbitrary distribution over -element set: Analysis:

  44. Testing monotone distributionsusing oblivious decomposition Can use learning algorithm to get -sample algorithm for testing problem. But, can do better by using oblivious decomposition directly: testing equality of monotone distributions over to accuracy testing equality of arbitrary distributions over to accuracy : known monotone distribution over : unknown monotone distribution over : known distribution over : unknown distribution over Using [BFFKRW02], get -sample testing algorithm Can show lower bound for any tester.

  45. [BKR04] implicitly gave log^2(n)loglog(n)/eps^5-sample algorithm for learning monotone distribution

More Related