1.03k likes | 1.46k Vues
Anomaly Detection and Virus Propagation in Large Graphs. Christos Faloutsos CMU. Thank you!. Dr. Ching-Hao (Eric) Mao Prof. Kenneth Pao. Outline. Part 1: anomaly detection OddBall (anomaly detection) Belief Propagation Conclusions Part 2: influence propagation.
E N D
Anomaly Detection and Virus Propagation in Large Graphs Christos Faloutsos CMU
Thank you! • Dr. Ching-Hao (Eric) Mao • Prof. Kenneth Pao Faloutsos, Prakash, Chau, Koutra, Akoglu
Outline • Part 1: anomaly detection • OddBall (anomaly detection) • Belief Propagation • Conclusions • Part 2: influence propagation Faloutsos, Prakash, Chau, Koutra, Akoglu
OddBall: Spotting Anomaliesin Weighted Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School of Computer Science PAKDD 2010, Hyderabad, India
Main idea For each node, • extract ‘ego-net’ (=1-step-away neighbors) • Extract features (#edges, total weight, etc etc) • Compare with the rest of the population Faloutsos, Prakash, Chau, Koutra, Akoglu
What is an egonet? egonet ego Faloutsos, Prakash, Chau, Koutra, Akoglu
Selected Features • Ni: number of neighbors (degree) of ego i • Ei: number of edges in egonet i • Wi: total weight of egonet i • λw,i: principal eigenvalue of the weighted adjacency matrix of egonet I Faloutsos, Prakash, Chau, Koutra, Akoglu
Near-Clique/Star Faloutsos, Prakash, Chau, Koutra, Akoglu
Near-Clique/Star Faloutsos, Prakash, Chau, Koutra, Akoglu
Near-Clique/Star Faloutsos, Prakash, Chau, Koutra, Akoglu
Near-Clique/Star Andrew Lewis (director) Faloutsos, Prakash, Chau, Koutra, Akoglu
Outline • Part 1: anomaly detection • OddBall (anomaly detection) • Belief Propagation • Conclusions • Part 2: influence propagation Faloutsos, Prakash, Chau, Koutra, Akoglu
E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www’07] Faloutsos, Prakash, Chau, Koutra, Akoglu
E-bay Fraud detection Faloutsos, Prakash, Chau, Koutra, Akoglu
E-bay Fraud detection Faloutsos, Prakash, Chau, Koutra, Akoglu
E-bay Fraud detection - NetProbe Faloutsos, Prakash, Chau, Koutra, Akoglu
Popular press And less desirable attention: • E-mail from ‘Belgium police’ (‘copy of your code?’) Faloutsos, Prakash, Chau, Koutra, Akoglu
Outline • OddBall (anomaly detection) • Belief Propagation • Ebay fraud • Symantec malware detection • Unification results • Conclusions Faloutsos, Prakash, Chau, Koutra, Akoglu
PATENT PENDING SDM 2011, Mesa, Arizona Polonium: Tera-Scale Graph Mining and Inference for Malware Detection Polo Chau Machine Learning Dept Carey Nachenberg Vice President & Fellow Jeffrey Wilhelm Principal Software Engineer Adam Wright Software Engineer Prof. Christos Faloutsos Computer Science Dept
Polonium: The Data 60+ terabytes of dataanonymously contributedby participants of worldwide Norton Community Watch program 50+ million machines 900+ million executable files Constructed a machine-file bipartite graph (0.2 TB+) 1 billion nodes (machines and files) 37 billion edges Faloutsos, Prakash, Chau, Koutra, Akoglu
Polonium: Key Ideas • Use Belief Propagation to propagate domain knowledge in machine-file graph to detect malware • Use “guilt-by-association” (i.e., homophily) • E.g., files that appear on machines with many bad files are more likely to be bad • Scalability: handles 37 billion-edge graph Faloutsos, Prakash, Chau, Koutra, Akoglu
Polonium: One-Interaction Results Ideal 84.9% True Positive Rate1% False Positive Rate True Positive Rate % of malware correctly identified False Positive Rate % of non-malware wrongly labeled as malware Faloutsos, Prakash, Chau, Koutra, Akoglu
Outline • Part 1: anomaly detection • OddBall (anomaly detection) • Belief Propagation • Ebay fraud • Symantec malware detection • Unification results • Conclusions • Part 2: influence propagation Faloutsos, Prakash, Chau, Koutra, Akoglu
Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms Danai Koutra U Kang Hsing-Kuo Kenneth Pao Tai-You Ke Duen Horng (Polo) Chau Christos Faloutsos ECML PKDD, 5-9 September 2011, Athens, Greece
Problem Definition:GBA techniques ? Given: Graph; & few labeled nodes Find: labels of rest (assuming network effects) ? ? ? Faloutsos, Prakash, Chau, Koutra, Akoglu
Homophily and Heterophily homophily heterophily NOTall methods handle heterophily BUT proposed method does! Step 1 All methods handle homophily Step 2 Faloutsos, Prakash, Chau, Koutra, Akoglu
Are they related? • RWR (Random Walk with Restarts) • google’s pageRank (‘if my friends are important, I’m important, too’) • SSL (Semi-supervised learning) • minimize the differences among neighbors • BP (Belief propagation) • send messages to neighbors, on what you believe about them Faloutsos, Prakash, Chau, Koutra, Akoglu
Are they related? YES! • RWR (Random Walk with Restarts) • google’s pageRank (‘if my friends are important, I’m important, too’) • SSL (Semi-supervised learning) • minimize the differences among neighbors • BP (Belief propagation) • send messages to neighbors, on what you believe about them Faloutsos, Prakash, Chau, Koutra, Akoglu
Correspondence of Methods 1 1 1 d1 d2 d3 0 1 0 1 0 1 0 1 0 ? 0 1 1 prior labels/ beliefs final labels/ beliefs adjacency matrix Faloutsos, Prakash, Chau, Koutra, Akoglu
Results: Scalability runtime (min) # of edges (Kronecker graphs) FABP is linear on the number of edges. Faloutsos, Prakash, Chau, Koutra, Akoglu
Results (5): Parallelism % accuracy FABP ~2x faster & wins/ties on accuracy. runtime (min) Faloutsos, Prakash, Chau, Koutra, Akoglu
Conclusions • Anomaly detection: hand-in-hand with pattern discovery (‘anomalies’ == ‘rare patterns’) • ‘OddBall’ for large graphs • ‘NetProbe’ and belief propagation: exploit network effects. • FaBP: fast & accurate Faloutsos, Prakash, Chau, Koutra, Akoglu
Outline • Part 1: anomaly detection • OddBall (anomaly detection) • Belief Propagation • Conclusions • Part 2: influence propagation Faloutsos, Prakash, Chau, Koutra, Akoglu
Influence propagation in large graphs - theorems and algorithms B. AdityaPrakash http://www.cs.cmu.edu/~badityap Christos Faloutsos http://www.cs.cmu.edu/~christos Carnegie Mellon University
Networks are everywhere! Facebook Network [2010] Gene Regulatory Network [Decourty 2008] Human Disease Network [Barabasi 2007] The Internet [2005] Faloutsos, Prakash, Chau, Koutra, Akoglu
Dynamical Processes over networks are also everywhere! Faloutsos, Prakash, Chau, Koutra, Akoglu
Why do we care? • Information Diffusion • Viral Marketing • Epidemiology and Public Health • Cyber Security • Human mobility • Games and Virtual Worlds • Ecology • Social Collaboration ........ Faloutsos, Prakash, Chau, Koutra, Akoglu
Why do we care? (1: Epidemiology) • Dynamical Processes over networks [AJPH 2007] CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks Faloutsos, Prakash, Chau, Koutra, Akoglu
Why do we care? (1: Epidemiology) • Dynamical Processes over networks • Each circle is a hospital • ~3000 hospitals • More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] Problem: Given k units of disinfectant, whom to immunize? Faloutsos, Prakash, Chau, Koutra, Akoglu
Why do we care? (1: Epidemiology) ~6x fewer! [US-MEDICARE NETWORK 2005] CURRENT PRACTICE OUR METHOD Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year) Faloutsos, Prakash, Chau, Koutra, Akoglu
Why do we care? (2: Online Diffusion) > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users Faloutsos, Prakash, Chau, Koutra, Akoglu
Why do we care? (2: Online Diffusion) • Dynamical Processes over networks Buy Versace™! Followers Celebrity Social Media Marketing Faloutsos, Prakash, Chau, Koutra, Akoglu
High Impact – Multiple Settings epidemic out-breaks Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? products/viruses transmit s/w patches Faloutsos, Prakash, Chau, Koutra, Akoglu
Research Theme ANALYSIS Understanding POLICY/ ACTION Managing DATA Large real-world networks & processes Faloutsos, Prakash, Chau, Koutra, Akoglu
In this talk Given propagation models: Q1: Will an epidemic happen? ANALYSIS Understanding Faloutsos, Prakash, Chau, Koutra, Akoglu
In this talk Q2: How to immunize and control out-breaks better? POLICY/ ACTION Managing Faloutsos, Prakash, Chau, Koutra, Akoglu
Outline • Part 1: anomaly detection • Part 2: influence propagation • Motivation • Epidemics: what happens? (Theory) • Action: Who to immunize? (Algorithms) Faloutsos, Prakash, Chau, Koutra, Akoglu
A fundamental question Strong Virus Epidemic? Faloutsos, Prakash, Chau, Koutra, Akoglu
example (static graph) Weak Virus Epidemic? Faloutsos, Prakash, Chau, Koutra, Akoglu
Problem Statement # Infected above (epidemic) below (extinction) time Separate the regimes? Find, a condition under which • virus will die out exponentially quickly • regardless of initial infection condition Faloutsos, Prakash, Chau, Koutra, Akoglu