1.29k likes | 1.47k Vues
STAR: Steiner-Tree Approximation in Relationship Graphs. Max-Planck Institute for Informatics, Database and Information Systems, Gjergji Kasneci , Maya Ramanath , Mauro Sozio , Fabian M. Suchanek , Gerhard Weikum. Introduction. Entity-Relationship Graphs
E N D
STAR: Steiner-Tree Approximation in RelationshipGraphs Max-Planck Institute for Informatics, Database and Information Systems, GjergjiKasneci, Maya Ramanath, Mauro Sozio, Fabian M. Suchanek, Gerhard Weikum
Introduction • Entity-Relationship Graphs • An other way of representing relational Data • Consist of labeled Nodes and Edges, • Node Labels correspond to Entities • Edge Labels represent relations between Entities • Edge Weights and Entity relation strength. • Taxonomic Relations (subClassOf, type )
Introduction • Example of an Entity Relationship Graph Generalization Specialization
Introduction • Quering E-R Graphs • The Relationship Search Query Class: • Given a set of two, three, or more entities (nodes), find their closest relationships (edges or paths) that connect the entities in the strongest possible way. • Strongest Related to Informativenes • A Relationship Search Query Example • Query: “How are Germany’s chancellor Angela Merkel, the mathematician Richard Courant, Turing-Award winner Jim Gray, and the Dalai Lama related?” • Informative answer: All have a doctoral degree from a German university • How are Angela Merkel, Arnold Swarzenegger, Max Plank and Germany are Related ?
Motivation and Problem • Information Discovery as opposed to Lookup • The Nature of the Answer • Can be a Tree embeded In Original Graph • Input Nodes (Query) must be connected by the Tree • How Good is the answer? • A scoring model can exploit node and edge weights • The formal Definition of the Problem: • Compute the k lowest-cost Steiner trees:
Motivation and Problem • What is a Steiner Tree Problem? • Steiner Tree Examples: Steiner tree for three terminals V’ = {A, B, C} Note the Steiner Point S. Steiner tree for four terminals V’ = {A, B, C, D} Note the Steiner Points S1, S2.
Motivation and Problem • Steiner Tree Problem Complexity • NP-Hard Complete (Optimal) • Approximate Solution algorithms • Approximation Ratio: • Measures the Quality of approximation algorithm • Weight of Aproximate Graph out / weight of Optimal Graph Output • Benefits by Reducing Approximation Ratio • Viable Runtimes (efficiency) • Better Graph quality (Informativenes) near-optimal
Paper Contributions • Presents STAR a new Efficient algorithm • Computes near-optimal Steiner Trees • Exploits Taxonomic Schema (when available) • Viable Runtimes over large graphs • STAR Approximation Ratio Proofs: • O(logn), for n given query entities (Worst Case) • Improvement over other approximation ratios • , or • STAR practically is better than a - approximation algorithm • STAR top-k tree capability • STAR Outperforms State of the art algorithms by an order of magnitude • Can be applied either on main memory datasets or on-disc resident Large Graphs. • Evaluation via Comparison with other cutting edge algorithms
The Star Algorithm • Introduction • First Phase • Second Phase • Examples
The Star Algorithm • Introduction • First Phase • Second Phase • Examples
The Star Algorithm – Introduction • Problem Definition • As Stated in introduction • Further we are interested in finding Top-k result trees by increasing order • Exploitation of Taxonomic Backbones • Node Labels as Entities • Edge Labels as weights or relations • Taxonomic Availability is not compulsory • Runs in 2 Phases • Phase 1: Uses Taxonomic Information (when available) • Builds a quick Tree by pruning the Original Graph • Interconnects all given nodes • Phase 2: Iteratively improves the Tree from Phase 1
The Star Algorithm • Introduction • First Phase • Second Phase • Examples
The Star Algorithm - First Phase • Prunes Original Graph • Runs Iteratorsin each Terminal • Iterators Run in a Round Robin Manner • Iterators Follow only Taxonomic Edges: • subClassOf, type
Breadth First Search Observe Taxonomic Structure 2 4 8 s 5 7 3 6 9
Breadth First Search Shortest pathfrom s 1 2 2 4 8 0 s 5 7 3 6 9 Undiscovered Queue: s Discovered Top of queue Finished
Breadth First Search 1 2 4 8 0 s 5 7 3 3 6 9 1 Undiscovered Queue: s 2 Discovered Top of queue Finished
Breadth First Search 1 2 4 8 0 5 s 5 7 1 3 6 9 1 Undiscovered Queue: s 2 3 Discovered Top of queue Finished
Breadth First Search 1 2 4 8 0 s 5 7 1 3 6 9 1 Undiscovered Queue: 2 3 5 Discovered Top of queue Finished
Breadth First Search 1 2 2 4 4 8 0 s 5 7 1 3 6 9 1 Undiscovered Queue: 2 3 5 Discovered Top of queue Finished
Breadth First Search 1 2 2 4 8 5 already discovered:don't enqueue 0 s 5 7 1 3 6 9 1 Undiscovered Queue: 2 3 5 4 Discovered Top of queue Finished
Breadth First Search 1 2 2 4 8 0 s 5 7 1 3 6 9 1 Undiscovered Queue: 2 3 5 4 Discovered Top of queue Finished
Breadth First Search 1 2 2 4 8 0 s 5 7 1 3 6 9 1 Undiscovered Queue: 3 5 4 Discovered Top of queue Finished
Breadth First Search 1 2 2 4 8 0 s 5 7 1 3 6 6 9 1 2 Undiscovered Queue: 3 5 4 Discovered Top of queue Finished
Breadth First Search 1 2 2 4 8 0 s 5 7 1 3 6 9 1 2 Undiscovered Queue: 3 5 4 6 Discovered Top of queue Finished
Breadth First Search 1 2 2 4 8 0 s 5 7 1 3 6 9 1 2 Undiscovered Queue: 5 4 6 Discovered Top of queue Finished
Breadth First Search 1 2 2 4 8 0 s 5 7 1 3 6 9 1 2 Undiscovered Queue: 5 4 6 Discovered Top of queue Finished
Breadth First Search 1 2 2 4 8 0 s 5 7 1 3 6 9 1 2 Undiscovered Queue: 4 6 Discovered Top of queue Finished
Breadth First Search 1 2 3 2 4 8 8 0 s 5 7 1 3 6 9 1 2 Undiscovered Queue: 4 6 Discovered Top of queue Finished
Breadth First Search 1 2 3 2 4 8 0 s 5 7 1 3 6 9 1 2 Undiscovered Queue: 4 6 8 Discovered Top of queue Finished
Breadth First Search 1 2 3 2 4 8 0 7 s 5 7 1 3 3 6 9 1 2 Undiscovered Queue: 6 8 Discovered Top of queue Finished
Breadth First Search 1 2 3 2 4 8 0 s 5 7 1 3 3 6 9 9 3 1 2 Undiscovered Queue: 6 8 7 Discovered Top of queue Finished
Breadth First Search 1 2 3 2 4 8 0 s 5 7 1 3 3 6 9 3 1 2 Undiscovered Queue: 6 8 7 9 Discovered Top of queue Finished
Breadth First Search 1 2 3 2 4 8 0 s 5 7 1 3 3 6 9 3 1 2 Undiscovered Queue: 8 7 9 Discovered Top of queue Finished
Breadth First Search 1 2 3 2 4 8 0 s 5 7 1 3 3 6 9 3 1 2 Undiscovered Queue: 7 9 Discovered Top of queue Finished
Breadth First Search 1 2 3 2 4 8 0 s 5 7 1 3 3 6 9 3 1 2 Undiscovered Queue: 7 9 Discovered Top of queue Finished
Breadth First Search 1 2 3 2 4 8 0 s 5 7 1 3 3 6 9 3 1 2 Undiscovered Queue: 7 9 Discovered Top of queue Finished
Breadth First Search 1 2 3 2 4 8 0 s 5 7 1 3 3 6 9 3 1 2 Undiscovered Queue: 7 9 Discovered Top of queue Finished
Breadth First Search 1 2 3 2 4 8 0 s 5 7 1 3 3 6 9 3 1 2 Undiscovered Queue: 9 Discovered Top of queue Finished
Breadth First Search 1 2 3 2 4 8 0 s 5 7 1 3 3 6 9 3 1 2 Undiscovered Queue: 9 Discovered Top of queue Finished
Breadth First Search 1 2 3 2 4 8 0 s 5 7 1 3 3 6 9 3 1 2 Undiscovered Queue: 9 Discovered Top of queue Finished
Breadth First Search 1 2 3 2 4 8 0 s 5 7 1 3 3 6 9 3 1 2 Undiscovered Queue: Discovered Top of queue Finished
Breadth First Search 1 2 3 2 4 8 0 s 5 7 1 3 3 6 9 3 1 2 Level Graph
First – Phase Example(Simple Breadth – First – Search Iterator from each Terminal)V’ = {Max Planck, Arnold Schwarzenegger, Germany}
Breadth First Search Iterators from Each Terminal Entity Person Organization Unit Scientist Politician Actor State Physicist Germany Max Planck Arnold Schwarzenegger As soon as iterators meet a result is constructed
Breadth First Search Iterators from Each Terminal Entity Person Organization Unit Scientist Politician Actor State Physicist Germany Max Planck Arnold Schwarzenegger T3 T2 T1 Queue T1: Max Planck Undiscovered Queue T2: Discovered Queue: T3: Top of queue Finished
Breadth First Search Iterators from Each Terminal Entity Person Organization Unit Scientist Politician Actor State Physicist Germany Max Planck Arnold Schwarzenegger T3 T2 T1 Queue T1: Max Planck Undiscovered Queue T2:Arnold Schwarzenegger Discovered Queue: T3: Top of queue Finished
Breadth First Search Iterators from Each Terminal Entity Person Organization Unit Scientist Politician Actor State Physicist Germany Max Planck Arnold Schwarzenegger T3 T2 T1 Queue T1: Max Planck Undiscovered Queue T2:Arnold Schwarzenegger Discovered Queue: T3: Germany Top of queue Finished
Breadth First Search Iterators from Each Terminal Entity Person Organization Unit Scientist Politician Actor State Physicist Germany Max Planck Arnold Schwarzenegger T3 T2 T1 Queue T1: Max Planck, Physicist Undiscovered Queue T2:Arnold Schwarzenegger Discovered Queue: T3: Germany Top of queue Finished
Breadth First Search Iterators from Each Terminal Entity Person Organization Unit Scientist Politician Actor State Physicist Germany Max Planck Arnold Schwarzenegger T3 T2 T1 Queue T1: Max Planck, Physicist Undiscovered Queue T2:Arnold Schwarzenegger, Politician Discovered Queue: T3: Germany Top of queue Finished