Finding hot-lists

Finding hot-lists Given a data stream S, find the k most popular items Most popular product in retail sales data Hot-spots for intrusion detection, fraud detection, network congestion etc. Frequently used blocks in executed code

Finding hot-lists – negative result [AMS96]: Approximating the frequency of the most frequent item in a sequence requires Omega(n) memory bits. Proof using Razborov’s element disjointness in communication complexity.

Communication Complexity ALICE input A BOB input B Cooperatively compute function f(A,B) Minimize bits communicated Unbounded computational power Communication Complexity C(f) – bits exchanged by optimal protocol Π Protocols? 1-way versus 2-way deterministic versus randomized Cδ(f) – randomized complexity for error probability δ

Adaptive Sampling [GM98] - Sample elements from the input set - Frequently occurring elements will be sampled more often - Sampling probability determined at runtime, according to the allowed memory usage - Tradeoff between overhead and accuracy - Give an estimate of the sample’s accuracy

Concise Samples - Uniform random sampling - Maintain an <id, count> pair for each element - The sample size can be much larger than the memory size - For skewed input sets the gain is much larger - Sampling is not applied at every block - Vitter’s reservoir sampling

Concise Samples

Comparison of Hot List Algorithms 500K values in [1,500] Zipf parameter = 1.5 Footprint = 100 20

Finding hot-lists

Finding hot-lists

Presentation Transcript

Lists

Lists

Lists

Lists

Lists

Lists

Lists

LISTS

Lists

Lists

Lists

Powerful Tips In Finding Hot Topic

Finding the best Hot Dog Supplier

Email Lists | Buy Email Lists | Pioneer Lists

Finding the Perfect Hot Dog Supplier

Lists

Lists

Lists

Finding out about a Best Hot Water Dispenser