1 / 40

Aules d’empresa 2011 Hands-on course

Aules d’empresa 2011 Hands-on course. Contents. Introduction DEX API Running example Database construction Validate database construction Script loaders Query database Graph algorithms. , a graph database. Graph databases focus on the structure of the model.

ghalib
Télécharger la présentation

Aules d’empresa 2011 Hands-on course

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Aulesd’empresa 2011Hands-on course

  2. Contents • Introduction • DEX API • Runningexample • Databaseconstruction • Validatedatabaseconstruction • Script loaders • Querydatabase • Graphalgorithms

  3. , a graph database • Graph databases focus on the structure of the model. • Nodes and edges instead of tables. • Implicit relation in the model. • DEX is a programming librarywhich allows to manage agraph database. • Very large datasets. • High performance query processing.

  4. Basic concepts • Persistent and temporary graph management programming library. • Data model: Typed and attributed directed multigraph. • Node and edge instances belong to a type (label). • Node and edge instances may have attribute values. • Edge can be directed or undirected. • Multiple edges between two nodes. • Type of edges: • Materialized: directed and undirected. • Virtual: constrained by the values of two attributes (foreign keys) • Just for navigation

  5. A graph model

  6. API • Java library: jdex.jar public API • Native library • Linux: libjdex.so • Windows: jdex.dll • System requirements: • Java Runtime Environment, v1.5 or higher. • Operative system: • Windows – 32 bits • Linux – 32 and 64 bits

  7. Core API – class diagram Graphfactory Persistent DB GraphPool Session DbGraph N 1 DEX 1 1 N 1 Graph 1 1 RGraph N N Objects Set of OIDs Temporary

  8. Core API – main methods GraphPool newSession()  Session Session getDbGraph()  DbGraph newGraph()  Rgraph close() DEX open(filename)  GraphPool create(filename)  GraphPool close() Objects add(long) exists(long) copy(objs) union(objs) Intersection(objs) difference(objs) Graph newNodeType(name)  int newEdgeType(name)  int newNode(type)  long newEdge(type)  long newAttribute(type, name)  long setAttribute(oid, attr, value) getAttribute(oid, attr)  value select(type)  Objects select(attr, op, value)  Objects explode(oid, type)  Objects Objects.Iterator hasNext()  boolean next()  long

  9. Running example DEX dex = new DEX(); GraphPoolgpool = dex.create(“C:/image.dex”); Session s = gpool.newSession(); … … s.close(); gpool.close(); dex.close();

  10. Running example … s.beginTx(); DbGraphdbg = s.getDbGraph(); intperson = dbg.newNodeType(“PERSON”); longname = dbg.newAttribute(person, “NAME”, STRING); longage= dbg.newAttribute(person, “AGE”, INT); long p1 = dbg.newNode(person); dbg.setAttribute(p1, name, “JOHN”); dbg.setAttribute(p1, age, 18); long p2 = dbg.newNode(person); dbg.setAttribute(p2, name, “KELLY”); long p3 = dbg.newNode(person); dbg.setAttribute(p3, name, “MARY”); s.commitTx(); … JOHN 18 KELLY MARY

  11. Running example … s.beginTx(); DbGraphdbg = s.getDbGraph(); intfriend = dbg.newUndirectedEdgeType(“FRIEND”); intsince = dbg.newAttribute(friend, “SINCE”, INT); long e1 = dbg.newEdge(p1, p2, friend); dbg.setAttribute(e1, since, 2000); long e2 = dbg.newEdge(p2, p3, friend); dbg.setAttribute(e2, since, 1995); … intloves = dbg.newEdgeType(“LOVES”); long e3 = dbg.newEdge(p1, p3, loves); s.commitTx(); … JOHN 18 2000 KELLY 1995 MARY

  12. Running example … s.beginTx(); DbGraphdbg = s.getDbGraph(); intphones = dbg.newEdgeType(“PHONES”); intwhen = dbg.newAttribute(phones, “WHEN”, TIMESTAMP); long e4 = dbg.newEdge(p1, p3, phones); dbg.setAttribute(e4, when, 4pm); long e5 = dbg.newEdge(p1, p3, phones); dbg.setAttribute(e5, when, 5pm); long e6 = dbg.newEdge(p3, p2, phones); dbg.setAttribute(e6, when, 6pm); s.commitTx(); … JOHN 18 2000 5pm KELLY 4pm 1995 MARY 6pm

  13. Running example … s.beginTx(); DbGraphdbg = s.getDbGraph(); Objectspersons = dbg.select(person); Objects.Iteratorit = persons.iterator(); while (it.hasNext()) { long p = it.next(); Stringname = dbg.getAttribute(p, name); } it.close(); persons.close(); s.commitTx(); … JOHN 18 2000 5pm KELLY 4pm 1995 MARY 6pm

  14. Running example … Objects objs1 = dbg.select(when, >=, 5pm); // objs1 = { e5, e6 } Objects objs2 = dbg.explode(p1, phones, OUT); // objs2 = { e4, e5 } Objectsobjs = objs1.intersection(objs2); // objs = { e5, e6 } ∩ { e4, e5 } = { e5 } … objs.close(); objs1.close(); objs2.close(); … JOHN 18 2000 5pm KELLY 4pm 1995 MARY 6pm

  15. Databaseconstruction • DEX Basics: • Node and edgetype: • Publicidentifier: String. • DEX identifier: Integer. • Attribute: • Publicidentifier: String. • DEX identifier: Long. • Objectinstances: • DEX identifier (OID): Long.

  16. Databaseconstruction • Nodes: • intGraph#newNodeType(Stringname) • Creates a new nodetypewiththegivenuniquename. • Returnsthe DEX nodetypeidentifier. • longGraph#newNode(intnodeType) • Creates a new objectbelongingtothegivennodetype. • Returnsthe DEX objectidentifier.

  17. Databaseconstruction • Edges: • intGraph#newEdgeType(Stringname, booldirected) • Creates a new edgetypewiththegivenuniquename. • Directedorundirectededgetype. • Returnsthe DEX edgeidentifier. • intGraph#newRestrictedEdgeType(Stringname, intsrcNodeType, intdstNodeType) • Creates a new directededgetypewiththegivenuniquename. • Returnsthe DEX edgeidentifier. • (Integrityrestriction) Source and destionation of theedge are restrictedtothegivennodetypes. • longGraph#newEdge(longtail, long head, intedgeType) • Creates a new edgebelongingtothegivenedgetype. • Tail isthesource and head isthe target (iffdirected). • Returnsthe DEX objectidentifier.

  18. Databaseconstruction • Attributes: • longGraph#newAttribute(inttype, Stringname, short dataType, short kind) • Creates a new attributewiththegivenuniquenameforthegivennodeoredgetype. • Returnsthe DEX attributeidentifier. • “dataType” can be: Value#STRING, Value#INT, Value#LONG, Value#DOUBLE, Value#BOOL, Value#TIMESTAMP. • “kind” can be: • Graph#ATTR_KIND_BASIC. Basic attribute (just set and getvalues). • Grahp#ATTR_KIND_INDEXED. Indexedattribute (set and getvalues as well as selectoperations) • Graph#ATTR_KINDUNIQUE. Indexedattribute. Unique (PK).

  19. Databaseconstruction • Attributes: • ClassValueencapsulatesdifferent data types: • String, Integer, Long, Double, Boolean, Timestamp. • Use themto set and getattributevaluesfortheobjects. • Graph#setAttribute(longoid, longattr, Value v) • Sets thegivenValueforthegivenattributetothegivenobjectidentifier. • Givenattributemustbedefinedfortheobject’stype. • Value ‘s data typemust match attribute’s data typeor NULL. • Graph#getAttribute(longoid, longattr, Value v) • GetstheValueforthegivenattributeforthegivenobjectidentifier. • Givenattributemustbedefinedfortheobject’svalue.

  20. Exercises • Allexercises are intotheNetbeansproject. • Open the IDE and theproject. • Required data sets are storedintothe “data” directory. • Requiredlibraries are storedintothe “libs” directory. • Allexerciseshave a mainmethodtobeexecuted.

  21. Exercise 1 • Create a synthetic DEX: • Createthefollowingschema. • User (nicknamestring, …) • Tweet (bodystring, …) • tweets (…) // fromUsertoTweet • Addsome data. • APIstobeused: • Graph#newNodeType / Graph#newEdgeType • Graph#newNode / Graph#newEdge • Graph#newAttribute / Graph#setAtttribute • Value

  22. Validatedatabaseconstruction • APIs: • GraphPool#dumpData(File f) • Dumps a summary of thelogicalcontent of thegraphdatabase. • GraphPool#dumpStorage(File f) • Dumpsinternalinformationaboutstoragecontent of thegraphdatabase. • Graph#export(PrintWriterpw, short kind, Export e) • Exportsthegraphtoanexternalformat. • “kind” can be: GRAPHVIZ or YGRAPHML. • Exportimplementation defines thevisualization (ifnull, default export). • Command-line shell: • edu.upc.dama.dex.shell.Shell • Seeedu.upc.dama.dex.shellpackagedescription.

  23. Exercise 2 • Validateyourdatabaseconstruction: • Dump data summary. • Dumpstoragesummary. • Default export. • yED • (Optional) Shell. • APIstobeused: • Graph#dumpData. • Graph#dumpStorage. • Graph#export. • Shell.

  24. Script loaders • Schemadefinition CREATE DBGRAPH alias INTO filename CREATE NODE node_type_name "(“ [attribute_name (INT|LONG|DOUBLE|STRING|BOOLEAN|TIMESTAMP|TEXT) [INDEXED|UNIQUE|BASIC] , ...] ")“ CREATE [UNDIRECTED|VIRTUAL] EDGE edge_type_name [FROM node_type_name[.attribute_name] TO node_type_name[.attribute_name]] "(“ [attribute_name (INT|LONG|DOUBLE|STRING|BOOLEAN|TIMESTAMP|TEXT) [INDEXED|UNIQUE|BASIC] , ...] ") [MATERIALIZE NEIGHBORS]"

  25. Script loaders • Load nodes LOAD NODES file_name COLUMNS attribute_name [alias_name], … INTO node_type_name [IGNORE (attribute_name|alias_name), …] [FIELDS [TERMINATED char] [ENCLOSED char] [ALLOW_MULTILINE]] [FROM num] [MAX num] [MODE (ROWS|COLUMNS [SPLIT [PARTITIONS num]])]

  26. Script loaders • Load edges LOAD EDGES file_name COLUMNS attribute_name [alias_name], … INTO node_type_name [IGNORE (attribute_name|alias_name), …] WHERE TAIL (attribute_name|alias_name) = node_type_name.attribute_name HEAD (attribute_name|alias_name) = node_type_name.attribute_name [FIELDS [TERMINATED char] [ENCLOSED char] [ALLOW_MULTILINE]] [FROM num] [MAX num] [MODE (ROWS|COLUMNS [SPLIT [PARTITIONS num]])]

  27. Script loaders • APIs: • edu.upc.dama.dex.script.ScriptParser • Command-line tool: • edu.upc.dama.dex.script.ScriptParser • Seeedu.upc.dama.dex.scriptpackagedescription.

  28. Twitter data model • Thisisthe data modelbasedonTwittertobeusedduringtheexercises.

  29. Exercise 3 • CreatetheTwitterdatabase: • Complete theschemadefinition script. • Complete theloader script. • APIstobeused: • ScriptParser. • Resources: • CSV files intothe “data/twitter” directory. • Script files intothe “data/twitter/scripts” directory (*.des).

  30. Exercise 4 • Once again, validateyourdatabaseconstruction: • Dump data summary. • Dumpstoragesummary. • Default export. • yED • (Optional) Shell • APIstobeused: • Graph#dumpData • Graph#dumpStorage • Graph#export • Shell

  31. Querydatabase • Retrive data: • ClassObject • Set<Long> • Iterable<Long> • Storeslarge sets of objectidentifiers. • No order. • Combine operations: • Union. • Intersection. • Difference.

  32. Querydatabase • Retrive data: • ObjectsGraph#select(int t) • Retrievesobjectidentifiersbelongingtothegivennodeoredgetype. • ObjectsGraph#select(longattr, short op, Value v) • Retrievesobjectidentifierswhichsatisfaythequery. • “op” can be: Graph#OPERATION_{EQ|NE|GT|GE|LT|LE|LIKE|ERE} • longGraph#findObj(longattr, Value v) • Retrieveobjectidentifierwhich has thegivenvalueforthegivenattribute (or INVALID_OID ifnotfound).

  33. Querydatabase • Navigation: • ObjectsGraph#explode(longoid, intedgeType, short direction) • Retrievesout-goingor in-goingedges (orboth) fromortothegivenobject and forthegivenedgetype. • “direction” can be: Graph#EDGES_IN, Graph#EDGES_OUT, Graph#EDGES_BOTH. • ObjectsGraph#neighbors(longoid, intedgeType, short direction) • Retrievesneighbornodestothegivenobjectwhich can bereachedthroughthegivenedgetype and direction. • “direction” can be: Graph#EDGES_IN, Graph#EDGES_OUT, Graph#EDGES_BOTH.

  34. Graphalgorithms • “edu.upc.dama.dex.algorithms” package. • Traversals: • Iterator<Long> • Returnsnodeidentifiers. • TraversalBFS • Breadth-firstsearch. • TraversalDFS • Depth-firstsearch. • Shortestpath: • SinglePairShortestPathBFS • Unweightedgraph. • SinglePairShortestPathDijkstra • Weightedgraph. • User can specifywhichnodeoredgetypes can beusedforthenavigation.

  35. Attributevalues • ClassValues: • DifferentattributevaluesIterator. • Iterator<Value> • Ascendentordescendentorder. • RetrieveValues: • ValuesGraph#getValues(longattr, short order) • RetrieveValuesforthegivenattribute. • “order” can be: Graph#ORDER_ASCENDENT, Graph#ORDER_DESCENDENT.

  36. Exercise 5 • Basic queries: • Get “Tweet”sfrom a “User”. • 1-hop navigation. • Get “Tweet”swhich share 2 (or more) given “Hastag”s. • Objectscombination. • Shortestdistancebetweentwogiven “User”s. • Justnavigatethroughthe “follows” relationship. • Use databasecreated at Exercise 3. • APIstobeused: • Graph#findObj /Graph#select • Graph#neighbors • Objects • SinglePairShortestPath

  37. Exercise 6 • Updates: • Createanattributeforeach “User” tostorethenumber of references (“depicts”) tothe “User”. • Compute and storethevalueforeach “User”. • Findthemost popular “User”. • Themostreferencedone. • Use databasecreated at Exercise 3. • APIstobeused: • Graph#degree • Graph#newAttribute / Graph#setAttribute • Values

  38. Export • Graph#export(PrintWriterpw, short kind, Export e) • “kind” can be: GRAPHVIZ or YGRAPHML. • ImplementExport interface to define thevisualization. • NodeExportgetNode(longoid) • Itiscalledforeachexistingnodeidentifier. • Return a NodeExportinstancewhich defines thevisualization of thegivennodeidentifier. • EdgeExoportgetEdge(longoid) • Itiscalledforeachexistingedgeidentifier. • ReturnanEdgeExportinstancewhich defines thevisualization of thegivenedgeidentifier.

  39. Exercise 7 • Visualization: • UpdatethegivenExportimplementation. • Checkouthowitupdatestheresultingvisualization. • yED • APIstobeused: • Export • GraphExport • NodeExport • EdgeExport • Graph#export

  40. Any question? DAMA Group Web Site: www.dama.upc.edu Sparsity Web Site: www.sparsity-technologies.com

More Related