1 / 1

A Scalable Approach to Size-Independent Network Similarity

A Scalable Approach to Size-Independent Network Similarity. Research Dublin. Dept. of Computer Science Rutgers. dblp. querylog. egonet. Networks. ‘Signature’ Vectors (aggr. features). Oregon AS. 2. Michele Berlingerio 1 Danai Koutra 2 Tina Eliassi-Rad 3 Christos Faloutsos 2

thao
Télécharger la présentation

A Scalable Approach to Size-Independent Network Similarity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Scalable Approach to Size-Independent Network Similarity Research Dublin Dept. of Computer Science Rutgers dblp querylog egonet Networks ‘Signature’ Vectors (aggr. features) Oregon AS ... ... ... ... 2 Michele Berlingerio1 Danai Koutra2 Tina Eliassi-Rad3 Christos Faloutsos2 1IBM Research Dublin 2Carnegie Mellon University 3Rutgers University x-fer learning LEMMA The runtime complexity for generating NetSimile’s ‘signature’ vectors is linear on the number of edges in the input networks: Why Canberra distance? It is sensitive to small changes near 0 normalizes the absolute difference of the individual comparisons. features features k 3 1 2 3 7 2 3 1 1 7 . . . . . . • Problem Definition • INPUT: 2 anonymized networks • - GIVEN: node IDs • - NOT GIVEN: side-info • class labels • OUTPUT: structural similarity score • REQUIRED PROPERTIES: (P1) effectiveness • size-independence, intuitiveness, interpretability • (P2) scalability Our Approach: NetSimile 1 2 . . . n 1 2 . . . n Experimental Results . . . . . . n o d e s n o d e s # nodes Step 1: Feature Extraction Intuitiveness + Interpretability of NetSimile NetSimile and node-overlap • Local and egonet features: • # of neighbors • clustering coefficient • avg. # of neighbors’ neighbors • avg. clustering coeff. of neighbors • edges in egonet • outgoing edges from egonet • # of neighbors of egonet Hypothesis: bigger node overlap => greater similarity bigger overlap Motivation smaller distance homogeneity in colors Implicit Assumption: networks are from the same domain. FG = 2 Observation: NetSimile gives better and more intuitive graph clusters than the EIG method (eval-based competitor method). Observation: The lower the NetSimile score (greater similarity), the higher the normalized node intersection of the input networks. Step 2: Feature Aggregation Graph Database: clustering 1 Application: Discontinuity Detection in Yahoo! IM Entropy of feature vectors and discriminative power • 5 aggregators • median • mean • standard deviation • skewness • kurtosis 3 Anomaly detection: different models? Moments of feature distributions Discontinuity Detection 4 … median kurtosis skewness mean s.d. Features + Aggregators satisfy all the constraints! single ‘signature’ vector per network Day 1 Day 2 Day 3 Day 4 Day 5 Observation: NETSIMILE’s feature vectors have higher entropy than FSM’s or EIG’s, which implies that they are capturing the nuances in the graphs better than FSM or EIG. Step 3: Comparison 1. Microsoft offers to buy Yahoo!. 2. New features for flickr were announced. nodes: IM users edges: commu- nicatio events Datasets Similarity Scores s12 s13 … s1k s23 … s2k . . . sk-1,k • 30 real-world networks • arXiv: 5 co-authorship networks for different fields • DBLP-C: 6 co-authorship networks for diff. conferences • DBLP-Y: 5 DBLP co-authorship networks for 2005-2009 • IMDB: 5 collaboration networks from IMDB for • movies issued from 2005 to 2009 • Query Log: 5 word co-occurrence networks built • from a query log of approximately 20M web-search queries • submitted by 650K users over 3 months • Oregon AS: 5 autonomous systems routing graphs collected between • March 31st and May 26th 2001. • multiple synthetic networks • Barabási-Albert, Forest Fire, Erdös-Rényi, Watts-Strogatz Is NetSimile measuring size? sG1 Canberra distance Baseline methods: FSM: Frequent subgraph mining + Canberra distance of the relative supports vectors EIG: Eigenvalues extraction + Canberra distance sG2 Observation: NetSimile is not mea-suring size – there is no correlation between ex-tracted features and net-work size. • Also tried: • cosine similarity • euclidean distance • hypothesis testing: Mann-Whitney, Kolmogorov-Smirnov • … sG3 sGk • CONCLUSIONS • Novel approach: ‘signature’ vector for each graph (summarization) • NetSimile: • effective • size-independent, intuitive, interpretable • scalable • Applicability to a variety of problems • References • Michele Berlingerio, Francesco Bonchi, Björn Bringmann, and Aristides Gionis. Mining graph evolution rules. In ECML PKDD, pages 115–130, 2009. • Katherine Faust. Comparing social networks: Size, density and local structure. Advances in Methodology and Statistics, 3(2):185–216, 2006. • Owen Macindoe and Whitman Richards. Graph comparison using fine structure analysis. In IEEE Int’l Conf. on Privacy, Security, Risk and Trust, pages 193–200, 2010.

More Related