1 / 41

Big (graph) data analytics

Big (graph) data analytics. Christos Faloutsos CMU. CONGRATULATIONS!. Welcome to CMU!. Outline. Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly detection Conclusions. Q+A. Are you recruiting? How many? How many do you have? How frequently you meet them?

burnsjames
Télécharger la présentation

Big (graph) data analytics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big (graph) data analytics Christos Faloutsos CMU

  2. CONGRATULATIONS! Welcome to CMU! C. Faloutsos

  3. Outline • Q+A • Problem definition / Motivation • Graphs, tensors and brains • Anomaly detection • Conclusions C. Faloutsos

  4. Q+A • Are you recruiting? How many? • How many do you have? • How frequently you meet them? • What is your advising style? • How do you feel about summer internships? C. Faloutsos

  5. Q+A • 1 or 2 • 6 (+5pdocs) • 1/week • results • Yes/Maybe (FB, MSR, IBM, ++) • Are you recruiting? How many? • How many do you have? • How frequently you meet them? • What is your advising style? • How do you feel about summer internships? C. Faloutsos

  6. Outline • Q+A • Problem definition / Motivation • Graphs, tensors and brains • Anomaly detection • Conclusions C. Faloutsos

  7. Motivation • Data mining: ~ find patterns (rules, outliers) • How do real graphs look like? Anomalies? • Time series / Monitoring Measles @ PA, NY, … C. Faloutsos

  8. Graphs - why should we care? C. Faloutsos

  9. Graphs - why should we care? Food Web [Martinez ’91] ~1B users $10-$100B revenue Internet Map [lumeta.com] C. Faloutsos

  10. Outline • Q+A • Problem definition / Motivation • Graphs, tensors and brains • Anomaly detection • Conclusions C. Faloutsos

  11. NELL & concepts (=groups) • Predicates (subject, verb, object) in knowledge base Vagelis Papalexakis CMU-CS Tom Mitchell CMU/CS-MLD “Eric Claptonplays guitar” (48M) NELL (Never Ending Language Learner) data Nonzeros =144M “Barack Obamaisthe president of U.S.” (26M) (26M) C. Faloutsos

  12. Answer : tensor factorization • Recall: (SVD) matrix factorization: finds blocks ‘meat-eaters’ ‘steaks’ ‘kids’ ‘cookies’ ‘vegetarians’ ‘plants’ M products N users + + ~ C. Faloutsos

  13. Answer : tensor factorization • PARAFAC decomposition artists athletes politicians verb + + subject = object C. Faloutsos

  14. Answer : tensor factorization • PARAFAC decomposition • Results for who-calls-whom-when • 4M x 15 days ?? ?? ?? time + + caller = callee C. Faloutsos

  15. Concept Discovery • Concept Discovery in Knowledge Base C. Faloutsos

  16. Concept Discovery • Concept Discovery in Knowledge Base NP1: Internet, file, data NP2: Protocol, software, suite C. Faloutsos

  17. Neuro-semantics • Brain Scan Data* • 9 persons • 60 nouns • Questions • 218 questions • ‘is it alive?’, ‘can you eat it?’ *Mitchell et al. Predicting human brain activity associated with the meanings of nouns. Science,2008. Data@ www.cs.cmu.edu/afs/cs/project/theo-73/www/science2008/data.html C. Faloutsos

  18. Neuro-semantics • Brain Scan Data* • 9 persons • 60 nouns • Questions • 218 questions • ‘is it alive?’, ‘can you eat it?’ Patterns? C. Faloutsos

  19. Neuro-semantics • Brain Scan Data* • 9 persons • 60 nouns • Questions • 218 questions • ‘is it alive?’, ‘can you eat it?’ questions Patterns? airplane dog … nouns persons voxels C. Faloutsos

  20. Neuro-semantics = C. Faloutsos

  21. Neuro-semantics Small items -> Premotor cortex = C. Faloutsos

  22. Neuro-semantics Small items -> Premotor cortex Evangelos Papalexakis, Tom Mitchell, Nicholas Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy, Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x, SDM 2014 C. Faloutsos

  23. Scalability Google: > 450,000 processors in clusters of ~2000 processors each [Barroso+, “Web Search for a Planet: The Google Cluster Architecture” IEEE Micro 2003] Yahoo: 5Pb of data [Fayyad, KDD’07] Google-NY, Aug’14: ‘graph with 1T edges, 300B nodes’ Problem: machine failures, on a daily basis How to parallelize data mining tasks, then? A: map/reduce – hadoop (open-source clone) http://hadoop.apache.org/ C. Faloutsos

  24. Outline • Q+A • Problem definition / Motivation • Graphs, tensors and brains • Anomaly/fraud detection • Conclusions C. Faloutsos

  25. App-store fraud Opinion Fraud Detection in Online Reviews using Network Effects Leman Akoglu, Rishi Chandy, CF ICWSM’13 (NSF grant, with Alex Beutel) C. Faloutsos

  26. Problem • Given • user-product review network • review sign (+/-) • Classify • objects into type-specific classes: users: `honest’ / `fraudster’ products: `good’ / `bad’ reviews: `genuine’ / `fake’ No side data! (e.g., timestamp, review text) C. Faloutsos

  27. Formulation: BP – User Product honestbad honest good + Before After C. Faloutsos

  28. Top scorers Users Products + positive (4-5) rating o negative (1-2) rating C. Faloutsos

  29. Top scorers Users Products + positive (4-5) rating o negative (1-2) rating C. Faloutsos

  30. ‘Fraud-bot’ member reviews Same day activity! Same developer! Duplicated text! C. Faloutsos

  31. Outline • Q+A • Problem definition / Motivation • Graphs, tensors and brains • Anomaly/fraud detection • Time series, monitoring / forecasting • Conclusions C. Faloutsos

  32. ‘Tycho’ – epidemics analysis Yasuko Matsubara 50 states x 46 diseases C. Faloutsos

  33. ‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara C. Faloutsos

  34. ‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara Flu? Measles? August? No periodicity? C. Faloutsos

  35. ‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara Flu? Measles? August? No periodicity? C. Faloutsos

  36. ‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara Flu? Measles? August? No periodicity? C. Faloutsos

  37. ‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara Flu? Measles? August? No periodicity? C. Faloutsos

  38. ‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara Flu? Measles? August? No periodicity? C. Faloutsos

  39. ‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara https://www.tycho.pitt.edu/resources.php from U. Pitt (epidemiology dept.) Yasuko Matsubara, Yasushi Sakurai, Willem van Panhuis, and Christos Faloutsos, FUNNEL: Automatic Mining of Spatially Coevolving Epidemics, KDD 2014, New York City, NY, USA, Aug. 24-27, 2014. C. Faloutsos

  40. Open research questions • Patterns/anomalies for time-evolving graphs (Call graph, 3M people x 6mo) • Spot fraudsters in soc-net (eg., Twitter ‘$10 -> 1000 followers’) • How is the human brain wired C. Faloutsos

  41. Contact info • www.cs.cmu.edu/~christos • GHC 8019 • Ph#: x8.1457 • www.cs.cmu.edu/~christos/TALKS/14-09-ic/ • FYI: Course: 15-826, Tu-Th 3:00-4:20 • and, again WELCOME! C. Faloutsos

More Related