1 / 65

Mining Networks through Visual Analytics

Mining Networks through Visual Analytics. Incremental Hypothesis Building and Validation. David Auber Romain Bourqui Guy Melançon. CNRS LaBRI UMR 5800 & INRIA Futurs – GRAVITÉ Bordeaux, France. peacokmaps.com. InfoVis CyberInfraStructure – Pajek. “A picture is worth a thousand words”

lollar
Télécharger la présentation

Mining Networks through Visual Analytics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Networks through Visual Analytics Incremental Hypothesis Building and Validation David Auber Romain Bourqui Guy Melançon CNRS LaBRI UMR 5800 & INRIA Futurs – GRAVITÉ Bordeaux, France

  2. peacokmaps.com

  3. InfoVis CyberInfraStructure – Pajek • “A picture is worth a thousand words” • Chinese proverb (?)

  4. Tulip – BubbleTree

  5. Graph Viz Framework Tulip • “It’s all visual” • R. Feynman (Nobel prize in Physics)

  6. Internet traffic

  7. Voronoï Treemaps • “The purpose of computing is insight not numbers ” • R. Hamming (1973)

  8. Cushion Treemaps

  9. “Visualization uses computer graphics to help provide insight on complicated problems, models or systems” • “Scientific visualization is exploring data and information graphically, gaining understanding and insights into the data” • R.A. Earnshaw (a pioneer in computer graphics, 1973) Munzner’s Hyperbolic Browser

  10. Tulip – Sugiyama Layout

  11. Visualize? • Inselberg – « creator » of parallel coordinates • « Insight through images » • « Goal: Visual Model to Help our Intuition » • « Involves: Geometry, Cognition, Art ? »

  12. Visualize?

  13. Visual graph mining related to security issues • “Recognize” structural properties • Identify key actors • Identify their neighborhood • Community structure • Connectivity between communities • … “Chess players recognize patterns”

  14. Example from NCTC data • Extracted about 8000 incidents from WITS • Identified terrorists groups when possible (directly or through AFP) • Identified countries where incidents took place • Added territorial information (continents, world regions) to help organize the overall map

  15. Example from NCTC data • About 8000 incidents • 9419 nodes • 18486 edges • Layout is time consuming • Does not provide clue about structure • Filter out incidents with no identified group

  16. Example from NCTC data • Interactivity • « Play » with network • Apply various metrics • Attribute-based node filtering • Tulip Graph Viz Framework • Opensource • Plug-in architecture • www.tulip-software.org

  17. Massive data • Information big bang - Projet « How much information », Berkeley University • In 2001, about 1 exabyte (1 million terabytes) of data is generated annually worldwide, including 99.997% available only in digital form • In 2003 : each individual produces about 800 megabytes per year

  18. Massive data • 100 million FedEx transactions / day • 150 million VISA transactions / day • 300 millions long distance calls / day over ATT’s network • 35 billions e-mails / day over the world • 600 billions IP packets / day over DE-CIX backbone Keim, VIEW Workshop 2006

  19. Visualization and Moore’s law Daniel Keim - Keynote Address, VIEW 2006

  20. Visualization and Moore’s law • Issues that won’t be solved by hardware only • Design interaction together with visualization • Understand how and why visualization pays • Collaborate with other fields • Integrate visualization together with other technology NIH-NSF Visualization Research Challenges Report, 2006

  21. Added value of visual and interactive mining • KDD Panel «  The Perfect Data Mining Tool » [Ankerst 2002] • The human eye is an excellent tool for spotting natural patterns • Getting rid of the human in the loop? Wrong decision! • Increase human participation through visualization in the data exploration and knowledge discovery processes

  22. « Sense making loop » J. Thomas – Visual Analytics Initiative

  23. « Visualization mantras » • Visual Information Seeking Mantra • Overview, Zoom-in / Filter, and Details on Demand (Shneiderman, 1996) • Visual Analytics Mantra • Analyse first, Show the Important, Zoom, filter and analyse, Details on demand (Keim 2006)

  24. Visualization “pipeline” • A designer’s view on the visualization process

  25. Visualize? Protein interaction network (yeast); Barabàsi 2000

  26. Organize data prior to visualization • Layer or hierarchize data based on: • node/edge metrics (eigenvalues, centralities, …) • topological feature detection • Use relevant drawing methods • Combine with interaction

  27. Case study: ITA 2000 passenger air traffic • Cities connect through direct flights • Edge weights: number of passengers • Questions: • Read motivations of carriers through organization of the network? • Territorial logic? • Political? Economical?

  28. Case study: ITA 2000 passenger air traffic • Cities connect through direct flights • Edge weights: number of passengers • Questions: • Read motivations of carriers through organization of the network? • Territorial logic? • Political? Economical?

  29. TopoLayout – (Topological) Feature-based Hierarchization • Search the graph for components of growing complexity • Subtrees • Biconnected components (« blocks ») • Grid-like • « Clusters »

  30. TopoLayout – (Topological) Feature-based Hierarchization • Search the graph for components of growing complexity • Subtrees • Biconnected components (« blocks ») • Grid-like • « Clusters »

  31. TopoLayout – (Topological) Feature-based Hierarchization • Search the graph for components of growing complexity • Subtrees • Biconnected components • Grid-like • « Clusters »

  32. Search the graph for components of growing complexity Subtrees Biconnected components Grid-like « Clusters » Need to identify articulation points (“pivots”) The graph builds into a “tree of biconnected components” TopoLayout – (Topological) Feature-based Hierarchization

  33. TopoLayout – (Topological) Feature-based Hierarchization • Search the graph for components of growing complexity • Subtrees • Biconnected components (« blocks ») • Grid-like (eigenvalues) • « Clusters »

  34. TopoLayout – (Topological) Feature-based Hierarchization • Search the graph for components of growing complexity • Subtrees • Biconnected components (« blocks ») • Grid-like (eigenvalues) • « Clusters »

  35. TopoLayout • Components naturally organize as a hierarchy through the search process

  36. TopoLayout + interaction: Grouse • Explore the graph by unfolding/folding the hierarchy • The user’s navigation triggers layout of components • Higher level graphs (quotient graphs) are built from metanodes • Improve readability / Less visual elements • Faster layout, based on topology of quotient graph • Grouse

  37. TopoLayout + interaction: Grouse • Multilevel hierarchy: recursive grouping of metanodes

  38. TopoLayout + interaction: Grouse • Multilevel hierarchy: recursive grouping of metanodes

  39. TopoLayout + interaction: Grouse • Multilevel Hierarchy for Abstraction: Cut

  40. Multilevel navigation of small world networks • Small world networks: social networks, web graphs, transportation networks (ITA), … • Small world networks organize into several levels (hierarchy) [Adamic, Huberman] • Idea: capture the hierarchy and use it as a navigation paradigm

  41. Small world networks • Centralities • Bottleneck passageways • Network organizes around those « pivots » nodes

  42. Small world networks • Centralities • Betweenness centrality has high computational cost (global) • Betweenness centrality • Eigenvalue centrality • Prefer local index • Degree • Edge strength

  43. Wuv u e v Mu = Nu\Nv Mv = Nv\Nu Small world networks • Edge strength: proportion of cycles containing an edge (length 3 and 4) (Jaccard 1912) (Tanimoto 1958) Auber et al. 2003 Raddichi et al. 2004

  44. Wuv u e v Mu = Nu\Nv Mv = Nv\Nu Small world networks • Edge strength • Costs linear time if degree is bounded, otherwise quadratic …

  45. Wuv u e v Mu = Nu\Nv Mv = Nv\Nu Small world networks • Edge strength • Cost yet lower than most centralities (local versus global indices) • Incremental: local modification of graphs require local recomputation

  46. Community structure of small world networks • Filter out weak edges • Capture components • Infer quotient graph (metanodes) • Recurse over each component

  47. Community structure of small world networks • Filter out weak edges • Capture components • Infer quotient graph (metanodes) • Recurse over each component

  48. Community structure of small world networks • Filter out weak edges: • Q. What threshold to choose? • A. Best possible one (!) • Use quality criteria • MQ (modularity quality)

  49. MQ(C; G) = C1 C1 … … Cp Cp C2 C2 “Quality” criteria MQ • C = (C1, C2, …, Cp) is a clustering of a graph G

  50. MQ(C; G) = MQ / Nice properties • MQ varies over a bounded interval [-1, 1] • MQ behaves like a Gaussian distribution

More Related