120 likes | 211 Vues
Some Interesting Problems. Rakesh Agrawal IBM Almaden Research Center. Foundations. What is data mining A collection of techniques? A set of composable operations (a la Relational Algebra)? Hints: Inductive Databases (Mannila) Relational Calculus + Statistical Quantifiers (Imielinski).
E N D
Some Interesting Problems Rakesh Agrawal IBM Almaden Research Center
Foundations • What is data mining • A collection of techniques? • A set of composable operations (a la Relational Algebra)? • Hints: • Inductive Databases (Mannila) • Relational Calculus + Statistical Quantifiers (Imielinski)
Privacy Implications • Can we build accurate data models while preserving privacy of individual records? • Hints • Randomization (Agrawal & Srikant): Replace x by x+y where y is drawn from a known distribution • Anonymization (Crypto literature)
Web Mining: Beyond Click Streams • Mining knowledge bases from the web • Completeness • Accuracy • Malicious Spam • Hints: • Brin’s Book experiment • etc. etc.
Web Mining: Beyond hrefs • What other social behaviors exist on the web and how to make use of them? • Hints: • Viral marketing paper in this conf • etc. etc.
Actionable Patterns • Principled use of domain knowledge for • discarding uninteresting patterns • performance • Hints: • Papers in the recent KDD conferences
Simultaneous mining over multiple data types • Not just • Relational tables • Time series • Textual documents • But patterns across all of them
Some more problems • Online, incremental algorithms over data streams • When to retire the past data • Long sequential patterns • Discovering richer patterns (trees and dags) • Automatic, data-dependent selection of algorithm parameters
What not to work on? • The field is too young! • Let every flower bloom!!! • Too early to say we don’t need new algorithms • Impressive results of the PVSM algorithm • Emphasize evaluation and benchmarks • Interesting research issues
Applications most likely to benefit from data mining • Web applications (I think) • Bioinformatics (I hope!)
Inhibitors • Insufficient skill base (Education) • Usability
The true delight is in the finding out, rather than in the knowing. Isaac Asimov