410 likes | 441 Vues
Big (graph) data analytics. Christos Faloutsos CMU. CONGRATULATIONS!. Welcome to CMU!. Outline. Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly detection Conclusions. Q+A. Are you recruiting? How many? How many do you have? How frequently you meet them?
E N D
Big (graph) data analytics Christos Faloutsos CMU
CONGRATULATIONS! Welcome to CMU! C. Faloutsos
Outline • Q+A • Problem definition / Motivation • Graphs, tensors and brains • Anomaly detection • Conclusions C. Faloutsos
Q+A • Are you recruiting? How many? • How many do you have? • How frequently you meet them? • What is your advising style? • How do you feel about summer internships? C. Faloutsos
Q+A • 1 or 2 • 6 (+5pdocs) • 1/week • results • Yes/Maybe (FB, MSR, IBM, ++) • Are you recruiting? How many? • How many do you have? • How frequently you meet them? • What is your advising style? • How do you feel about summer internships? C. Faloutsos
Outline • Q+A • Problem definition / Motivation • Graphs, tensors and brains • Anomaly detection • Conclusions C. Faloutsos
Motivation • Data mining: ~ find patterns (rules, outliers) • How do real graphs look like? Anomalies? • Time series / Monitoring Measles @ PA, NY, … C. Faloutsos
Graphs - why should we care? C. Faloutsos
Graphs - why should we care? Food Web [Martinez ’91] ~1B users $10-$100B revenue Internet Map [lumeta.com] C. Faloutsos
Outline • Q+A • Problem definition / Motivation • Graphs, tensors and brains • Anomaly detection • Conclusions C. Faloutsos
NELL & concepts (=groups) • Predicates (subject, verb, object) in knowledge base Vagelis Papalexakis CMU-CS Tom Mitchell CMU/CS-MLD “Eric Claptonplays guitar” (48M) NELL (Never Ending Language Learner) data Nonzeros =144M “Barack Obamaisthe president of U.S.” (26M) (26M) C. Faloutsos
Answer : tensor factorization • Recall: (SVD) matrix factorization: finds blocks ‘meat-eaters’ ‘steaks’ ‘kids’ ‘cookies’ ‘vegetarians’ ‘plants’ M products N users + + ~ C. Faloutsos
Answer : tensor factorization • PARAFAC decomposition artists athletes politicians verb + + subject = object C. Faloutsos
Answer : tensor factorization • PARAFAC decomposition • Results for who-calls-whom-when • 4M x 15 days ?? ?? ?? time + + caller = callee C. Faloutsos
Concept Discovery • Concept Discovery in Knowledge Base C. Faloutsos
Concept Discovery • Concept Discovery in Knowledge Base NP1: Internet, file, data NP2: Protocol, software, suite C. Faloutsos
Neuro-semantics • Brain Scan Data* • 9 persons • 60 nouns • Questions • 218 questions • ‘is it alive?’, ‘can you eat it?’ *Mitchell et al. Predicting human brain activity associated with the meanings of nouns. Science,2008. Data@ www.cs.cmu.edu/afs/cs/project/theo-73/www/science2008/data.html C. Faloutsos
Neuro-semantics • Brain Scan Data* • 9 persons • 60 nouns • Questions • 218 questions • ‘is it alive?’, ‘can you eat it?’ Patterns? C. Faloutsos
Neuro-semantics • Brain Scan Data* • 9 persons • 60 nouns • Questions • 218 questions • ‘is it alive?’, ‘can you eat it?’ questions Patterns? airplane dog … nouns persons voxels C. Faloutsos
Neuro-semantics = C. Faloutsos
Neuro-semantics Small items -> Premotor cortex = C. Faloutsos
Neuro-semantics Small items -> Premotor cortex Evangelos Papalexakis, Tom Mitchell, Nicholas Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy, Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x, SDM 2014 C. Faloutsos
Scalability Google: > 450,000 processors in clusters of ~2000 processors each [Barroso+, “Web Search for a Planet: The Google Cluster Architecture” IEEE Micro 2003] Yahoo: 5Pb of data [Fayyad, KDD’07] Google-NY, Aug’14: ‘graph with 1T edges, 300B nodes’ Problem: machine failures, on a daily basis How to parallelize data mining tasks, then? A: map/reduce – hadoop (open-source clone) http://hadoop.apache.org/ C. Faloutsos
Outline • Q+A • Problem definition / Motivation • Graphs, tensors and brains • Anomaly/fraud detection • Conclusions C. Faloutsos
App-store fraud Opinion Fraud Detection in Online Reviews using Network Effects Leman Akoglu, Rishi Chandy, CF ICWSM’13 (NSF grant, with Alex Beutel) C. Faloutsos
Problem • Given • user-product review network • review sign (+/-) • Classify • objects into type-specific classes: users: `honest’ / `fraudster’ products: `good’ / `bad’ reviews: `genuine’ / `fake’ No side data! (e.g., timestamp, review text) C. Faloutsos
Formulation: BP – User Product honestbad honest good + Before After C. Faloutsos
Top scorers Users Products + positive (4-5) rating o negative (1-2) rating C. Faloutsos
Top scorers Users Products + positive (4-5) rating o negative (1-2) rating C. Faloutsos
‘Fraud-bot’ member reviews Same day activity! Same developer! Duplicated text! C. Faloutsos
Outline • Q+A • Problem definition / Motivation • Graphs, tensors and brains • Anomaly/fraud detection • Time series, monitoring / forecasting • Conclusions C. Faloutsos
‘Tycho’ – epidemics analysis Yasuko Matsubara 50 states x 46 diseases C. Faloutsos
‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara C. Faloutsos
‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara Flu? Measles? August? No periodicity? C. Faloutsos
‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara Flu? Measles? August? No periodicity? C. Faloutsos
‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara Flu? Measles? August? No periodicity? C. Faloutsos
‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara Flu? Measles? August? No periodicity? C. Faloutsos
‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara Flu? Measles? August? No periodicity? C. Faloutsos
‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara https://www.tycho.pitt.edu/resources.php from U. Pitt (epidemiology dept.) Yasuko Matsubara, Yasushi Sakurai, Willem van Panhuis, and Christos Faloutsos, FUNNEL: Automatic Mining of Spatially Coevolving Epidemics, KDD 2014, New York City, NY, USA, Aug. 24-27, 2014. C. Faloutsos
Open research questions • Patterns/anomalies for time-evolving graphs (Call graph, 3M people x 6mo) • Spot fraudsters in soc-net (eg., Twitter ‘$10 -> 1000 followers’) • How is the human brain wired C. Faloutsos
Contact info • www.cs.cmu.edu/~christos • GHC 8019 • Ph#: x8.1457 • www.cs.cmu.edu/~christos/TALKS/14-09-ic/ • FYI: Course: 15-826, Tu-Th 3:00-4:20 • and, again WELCOME! C. Faloutsos