Download Presentation
## Graph Algebra

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Graph Algebra**with Pattern Matching and Aggregation Support**Nowadays Graph**• Variety of Sources • Scientific Studies • Business Activities • Social Needs • Internet • Data are often of • Large Scale • Highly Liked • Schema-less**Managing Graph Data**• Primary Role of Database • Persistent store • Efficient Query • RDBMS • Storage Model : vertex and edge as tuples • Query: Link is by join • Graph Database • Storage Model: graphs • Query: path traversal**Why not RDBMS ?**• Schema Issue • Every data inserted may of a different schema (Web Graph) • Hard to represent semi structured info • Scalability Issues • ACID property VS CAP theorem • Query performance • Difficult to optimize intensive Joins**Graph Databases and Query Languages**No Universal Languages !!!**No Universal Language Like SQL?**• No commonly agreed algebra • Relational Algebra ? • Expressive, test-of-time to be effective • NOT suitable for GRAPH • Graph Algebra ? • Still at preliminary work**Issues with Relational Algebra (RA)**• Defined on Tuples or Set of Tuples • Mismatch with graph nature • Operators loose semantics • What is Union, Intersection, Join in GRAPH? • I/O type ? • Tables not GRAPH • Domain centric, not Data centric • Don’t anticipate out-of-order data • Treat Tuples as independent • Didn’t aware the links among Tuples • Queries written using RA are verbose and complex**Advantage of Graph Algebra**• An algebra itself is a query language • Easy to work out a language with Strong theoretic support • Evaluate expressiveness of given languages • Justify when to use what: Gremlin, Cypher etc. • Query Optimization • Operator order EQUALS execution plan • Algebraic Equivalence IMPLIES query optimization**Advantage of Graph Algebra**• Separation of Query and System: • One can write Query on any system as long as common algebra is supported. • Knowing RA, one can write SQL, PL/SQL, MS/SQL on MySQL, Oracle, SQLServer • Integrate new operators to database: • Current graph database systems didn’t support newly developed queries: • Graph OLAP, Graph Cube, Graph Aggregation etc. • Proper Algebra can incorporate these operators**Existing Works on Graph Algebra**• Graph QL [1] • A graph based algebra, operators are based on graphs • Selection • Join – not properly defined • Template • VAQL [2] • Focused on visualization • Selection • Aggregation – restricted • Visualization • Selection is restricted on isomorphism • Aggregation is not defined over edges • No algebra equivalence [1] He, Huahai, and Ambuj K. Singh. "Graphs-at-a-time: query language and access methods for graph databases." Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008. [2] Shaverdian, Anna A., et al. "A graph algebra for scalable visual analytics." Computer Graphics and Applications, IEEE 32.4 (2012): 26-33.**What we want for a Graph Algebra?**• Universal • Independent of graph types: • Directed VS Undirected. Simple VS Hyper. Homogeneous VS heterogeneous. • Expressive • Able to answer typical graph queries: • Pattern match, Reachability, Path finding etc. • Cover Relational Algebra (RA) • This ensures that graph database can handle relational data as well • Scale • Able to manage data in-scale • Support queries to summarize, aggregate data**Extended Algebra – Graph Model**• is an attributed graph • is vertex set, each has a unique ID • is edge set • contains attributes for each vertex • contains attributes for each edge • Edge contain identifier as well • In simple graph, edge can be represented by end points • contains information for the graph**Extended Algebra – Operators**• Projection • Restriction • Unification • Pattern Matching • Aggregation**Operators: Projection**• Purpose: • Select user interested data from base graph • Syntax: • are the attribute lists for vertex, edge and graph • The result is a new graph, whose attributes are trimmed by**Operators: Restriction**• Purpose: • Restrict the attribute value from base graph • Syntax: • : vertex restriction, select all the vertices (and their induced edges) which matches predicate • : edge restriction, select all the edges (and their endpoints) which matches predicate • : graph restriction, select graphs whose every vertex matches predicate, every edge matches and the graph matches**Operator: Unification**• Purpose: • Concatenate graphs • Syntax: • : vertex unification, unify vertices with identical ids • : edge unification, adding edges between two vertices matching • : attribute unification, create a virtual vertex for each distinct value in**Operator: Unification**P(v1,v1) and P(v4,v5) are true**Operator: Pattern Matching**• Purpose: • Find subgraphs out of base graph matching a given pattern • Syntax: • is a pattern, which is also a graph. The definition comes from [1] • returns all the matching graphs • returns abstractive matching, where only vertices appeared in is returned [1] Fan, Wenfei, et al. "Adding regular expressions to graph reachability and pattern queries." Data Engineering (ICDE), 2011 IEEE 27th International Conference on. IEEE, 2011.**Operator: Aggregation**• Purpose: • To summarize a given graph • Syntax: • : graph aggregation, every vertex is supplied to and every edge set is supplied to • : vertex aggregation, given a set of vertices group them by • : edge aggregation, given a set of edges, group them by**Expressiveness**• This set of operators aremore expressive than Relational Algebra and Graph QL • It can represent many graph queries • Reachability • Graph Cube computation • I-OLAP and T-OLAP**Algebra Equivalence**• When operators are chained up, they can form a query execution plan friend Comment friend V-Unification Base Graph Matched Result Restriction v.name Find the network induced by the person whose friends comment on each other’s posts with birthday greater than 1989. Output those names as a graph**Algebra Equivalence**• To generate multiple execution plans for a same query, we need theoretic support: • Identity Equivalence: • A operator can be represented by other operators • // p is a common attribute predicate • D(P) is to decompose a pattern P into edges • // • ...**Conclusion**• Graph Algebra plays an important role in graph database development • We make one step forward by proposing a Graph Algebra which: • extends existing algebraic work with • Regular pattern matching • Aggregation • is expressive and well-defined • contains equivalence rules for further query optimization