Converting Disjunctive Data to Disjunctive Graphs
Converting Disjunctive Data to Disjunctive Graphs. Lars Olson Data Extraction Group Funded by NSF. Introduction. Disjunctive databases Needed to represent disjunctive data Queries are CoNP-complete in general [Imielinski and Vadaparty, 1989] Transitive closure in disjunctive graphs
Converting Disjunctive Data to Disjunctive Graphs
E N D
Presentation Transcript
Converting Disjunctive Data to Disjunctive Graphs Lars Olson Data Extraction Group Funded by NSF
Introduction • Disjunctive databases • Needed to represent disjunctive data • Queries are CoNP-complete in general [Imielinski and Vadaparty, 1989] • Transitive closure in disjunctive graphs • CoNP-complete in general • Polynomial time, under certain circumstances [Lobo et. al, 1995]
The Problem • How do we convert the data into a disjunctive graph? • What is the complexity of the conversion? • Time • Space / Memory
Implementation • XML data repository • Shore / Niagara (Univ. of Wisconsin) • Xerces XML parser (Apache.org) • How do we represent a disjunctive database in storage? • Needs to be easy to convert to disjunctive graph • Needs to minimize the changes to the DTD and thus, the existing data
:B :A XML → Graph Conversion doc • XML → DOM tree Node <doc> <Node name=“A”> <EdgeTo ref=“B”/> </Node> <Node name=“B”></Node> ... </doc> Node EdgeTo A B B • Use primary key to distinguish doc→Node edges • Use foreign key to perform join (EdgeTo.ref = Node.name)
Disjunctions in XML, 1st Case <Node name=“A”> <EdgeTo ref=“B”/> <Disj> <EdgeTo ref=“C”/> <EdgeTo ref=“D”/> </Disj> </Node> ... B A C D …but how do we represent a disjunctive tail?
E G F H E G doc H F Disjunctions in XML, 1st Case <Node name=“A”> <EdgeTo ref=“B”/> <Disj> <EdgeTo ref=“C”/> <EdgeTo ref=“D”/> </Disj> </Node> <Disj> <Node name=“E”> <EdgeTo ref=“G”/> <EdgeTo ref=“H”/> </Node> <Node name=“F”> <EdgeTo ref=“G”/> <EdgeTo ref=“H”/> </Node> </Disj> ... or…
Disjunctions in XML, 2nd Case <Disj> <Tail> <Node name=“E”/> <Node name=“F”/> </Tail> <Head> <EdgeTo ref=“G”/> <EdgeTo ref=“H”/> </Head> </Disj> ... E G F H What if the disjunction isn’t the full cross-product?
I K J L Disjunctions in XML, 3rd Case <Disj> <Tail> <Node name=“I”/> </Tail> <Head> <EdgeTo ref=“K”/> </Head> <Tail> <Node name=“J”/> </Tail> <Head> <EdgeTo ref=“K”/> <EdgeTo ref=“L”/> </Head> </Disj> ...
Time and Space Complexity • n = # of nodes in DOM tree • counts edges as well • not necessarily proportional to # of values in the database • Ordinary XML: traverse tree, add edges. Distinguish records with primary keys, add edges for foreign keys. O(n) time, O(n) space.
Time and Space Complexity • <Disj>: same, except only one edge to all children. O(n), O(n). • <Disj> with <Tail> and <Head>: traverse tree, add <Tail> and <Head> elements to a list, add one edge, repeat for each Tail/Head pair. O(n), O(n).
Summary • We need to introduce new XML constructs: • <Disj> • Helper constructs <Tail> and <Head> • Three cases • simple tail, compound head • full cross-product • partial cross-product • Time and space requirements consistent with the transitive closure algorithm
Future Work • Solving path queries • Adding XML constructs for more complicated disjunctions e.g. Tail (A or B), Head ((C and D) or E) • Determining frequency of disjunctive data in real-world data • Developing a normal form for disjunctive XML • Minimize redundancy • Minimize disjunctive tails