120 likes | 221 Vues
Delve into data mining concepts, privacy-preserving models, web mining challenges, and actionable patterns for insightful analysis. Discover the evolving field's potential applications, inhibitors, and intriguing research quandaries. Unfold the realm of simultaneous mining and novel algorithm parameters in this dynamic industry where every effort adds value.
E N D
Some Interesting Problems Rakesh Agrawal IBM Almaden Research Center
Foundations • What is data mining • A collection of techniques? • A set of composable operations (a la Relational Algebra)? • Hints: • Inductive Databases (Mannila) • Relational Calculus + Statistical Quantifiers (Imielinski)
Privacy Implications • Can we build accurate data models while preserving privacy of individual records? • Hints • Randomization (Agrawal & Srikant): Replace x by x+y where y is drawn from a known distribution • Anonymization (Crypto literature)
Web Mining: Beyond Click Streams • Mining knowledge bases from the web • Completeness • Accuracy • Malicious Spam • Hints: • Brin’s Book experiment • etc. etc.
Web Mining: Beyond hrefs • What other social behaviors exist on the web and how to make use of them? • Hints: • Viral marketing paper in this conf • etc. etc.
Actionable Patterns • Principled use of domain knowledge for • discarding uninteresting patterns • performance • Hints: • Papers in the recent KDD conferences
Simultaneous mining over multiple data types • Not just • Relational tables • Time series • Textual documents • But patterns across all of them
Some more problems • Online, incremental algorithms over data streams • When to retire the past data • Long sequential patterns • Discovering richer patterns (trees and dags) • Automatic, data-dependent selection of algorithm parameters
What not to work on? • The field is too young! • Let every flower bloom!!! • Too early to say we don’t need new algorithms • Impressive results of the PVSM algorithm • Emphasize evaluation and benchmarks • Interesting research issues
Applications most likely to benefit from data mining • Web applications (I think) • Bioinformatics (I hope!)
Inhibitors • Insufficient skill base (Education) • Usability
The true delight is in the finding out, rather than in the knowing. Isaac Asimov