80 likes | 266 Vues
Data Mining in Ubiquitous Distributed Environments. Assaf Schuster Technion. Purpose of this Tutorial. Convergence of distributed systems and data mining Evolving field, no systematic coverage of all aspects
 
                
                E N D
Data Mining in Ubiquitous Distributed Environments Assaf Schuster Technion SEBD Tutorial, June 06
Purpose of this Tutorial • Convergence of distributed systems and data mining • Evolving field, no systematic coverage of all aspects • Will present: issues, challenges, examples for algorithmic approaches, ideas, tradeoffs accuracy vs. overhead • Will not present: formal treatment, proofs, details, technology, systems, hardware… SEBD Tutorial, June 06
Ubiquitous Computing Systems • Various Systems: Grid, P2P, WSN, MANET • Several similar technological aspects • Scale, aim for at least 10K (10M in P2P) • partial failure, heterogeneity, dynamic state / data • Multi-user, a 10K system serves >= 1K users • resource sharing, caching, consistency • Lots of distributed data • streams, incremental, anytime, local filtering, locality filtering • Cooperation of self-motivated parties • trust management, security, privacy, competitive market, self vs. global optimizations • Stringently resource limited • in-network computing, storage distribution • Non-similar technological aspects SEBD Tutorial, June 06
Ubiquitous Data Mining • For the community • E.g., P2P recommendations based on e-interaction • For Security • E.g., identify and avert DoS attack (Overpeer and P2P poisoning) • For Administration • E.g., misconfiguration detection system (DataMiningGrid demo) • For Data Cleansing • E.g., in-network outliers detection (and removal) in WSN • DM Using HPC • E.g., idle-cycle batch systems for high-complexity analysis tasks (Superlink-Online) SEBD Tutorial, June 06
Technological Challenges: Algorithms • Scalable and resource limited distributed DM • Algorithms for 10K peers, algorithms limited to two messages per peer per hour, synchronization-less, iteration-less, bag-of-tasks, dynamic divisibility, etc. • Monitoring • Distributed, local filtering • Success, Correctness, and Consistency • Partial failure, message dropping, heterogeneity, etc. can yield all sorts of trouble • Reusability, incrementality • E.g., multi-class classifiers, multi-metric k-means clustering, etc. SEBD Tutorial, June 06
Technological Challenges: Systems • Exploitation & HCI • Lay user (parameterless) DM, interactive DM • DM-based autonomous ubiquitous systems • Security, Fraud, and Privacy • Authorization, public-key-infrastructure, trust management, data polution • Longevity of DM jobs • Resource sharing, non dedicated resources • Communication patterns • Esp. reliability and addressability. Are these problems best solved by suitable algorithms? SEBD Tutorial, June 06