20 likes | 325 Vues
Visualization/ Analysis Applications (NetSight). Paper. Graph Schema Definition Interface. GAAL Language (Graph Querying Algebra). Write. DBMS Specific Adapters. Extensible ORDBMS. Researcher. Cite. WorksIn. Institute. Graph Schema and Other metadata. Multi-relational
E N D
Visualization/ Analysis Applications (NetSight) Paper Graph Schema Definition Interface GAAL Language (Graph Querying Algebra) Write DBMS Specific Adapters Extensible ORDBMS Researcher Cite WorksIn Institute Graph Schema and Other metadata Multi-relational (attributed graph) representation entity/event data JUNG: JAVA Universal Network/ Graph Framework Link Prediction Algorithms Algorithms for Data Mining and Querying with GraphsInvestigators: Padhraic Smyth, Sharad Mehrotra University of California, IrvineStudents: Joshua O’ Madadhain, Dawit Seid, Jon Hutchins • We have developed a general predictive learning approach that can uses historical graph data to learn a predictive model of whether a link is likely to exist between any pair of nodes A and B in a future time-period. The prediction model utilizes information from both structural graph features around A and B, as well as individual node attributes for A and B. For example, for co-author graphs, features can include distance in the co-author graph of A from B, properties of A’s and B’s graph neighborhoods, and topic models in the form of probability distributions characterizing A’s and B’s research interests. • - extensible, open source software library (API) for graph/network modeling, analysis, and visualization • can decorate graphs, vertices, edges with any JUNG object • complex filtering/transformation/subset management • includes library of network and graph algorithms • clustering, centrality, importance, paths, flows, etc • includes visualization API, or can use other visualization APIs (e.g. prefuse) • supports graphs, hypergraphs, parallel edges, mixed-mode graphs, k-partite graphs • - active user/developer community • 30,000 downloads, 1.3 million page visits • ranked #60 out of 100k Sourceforge projects • used in social network analysis, games, trust metrics, upcoming version of HP Zoomgraph, • email visualization, and Netsight JUNG software is publicly available at http://jung.sourceforge.net Results on KDD Challenge/Biobase Data This prediction competition in 2005 evaluated different approaches for link prediction. The specific problem was to predict new collaborations among 300,000 medical researchers in 2002, based on co-author relations in 128,000 papers published from 1998-2001. The figure to the right shows the “lift curve” the ratio of the number of true new collaborations predicted by our models’ rankings (relative to a random ranking). In the top 50 predictions for example, our models predict between 40 and 45 true collaborations (versus about 3 for a random ranking). Algorithms for Ranking Nodes in Dynamic Networks Example of software built using JUNG: Netsight, an interactive graph visualization and analysis tool Example of Rankings over Time • We have developed a novel algorithmic approach to the problem of determining the importance of nodes in a network where the links occur over time, e.g., an email network or a co-author network. The concept is similar to centrality ideas in social networks, and HITS and PageRank for Web page ranking, but produces a “dynamic rank” such that the rank of each node varies over time as it receives messages in the network. GAAL: A General-Purpose Graph Query Language Algebraic Framework • We have developed a new query language called GAAL that allows users to express complex relational queries on attributed graphs, allowing for queries on graph properties, aggregation operations, and scalability to very large graphs. In 2005 we have extended this approach to provide an algebraic framework for spatio-temporal analysis of semantic graphs. Data: Corporate Email History 1 million emails, 21 months, 628 individuals Email Rankings and Organizational Structure Query Example Architecture