1 / 27

(Meeting Overview)

(Meeting Overview). Arjen P. de Vries, Georgina Ramirez, Johan List Djoerd Hiemstra, Vojkan Mihajlovic , Mila Boldareva, Maurice van Keulen. Overview. Cirquid Goals Multi-model DBMS Architecture Region algebras For XML path traversal? For ranking in IR? GALAX Architecture (+example).

denver
Télécharger la présentation

(Meeting Overview)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. (Meeting Overview) Arjen P. de Vries, Georgina Ramirez, Johan List Djoerd Hiemstra, Vojkan Mihajlovic, Mila Boldareva, Maurice van Keulen

  2. Overview • Cirquid Goals • Multi-model DBMS Architecture • Region algebras • For XML path traversal? • For ranking in IR? • GALAX Architecture (+example)

  3. Goals • Develop efficient and flexible system that integrates information retrieval and data retrieval • ‘structure + content’ • Two parts: • Database architecture (Arjen & Djoerd) • Optimization (Henk Ernst)

  4. Example Query FOR $article IN document("collection.xml")//article WHERE $article/text() about ‘Willem-Alexander dating Maxima’ AND $article[@language = ‘English’] AND $article[@pub-date between ‘31-1-2003’ and ‘1-3-2003’] RETURN <result>$article</result>

  5. Basic Assumption • Coupled IR+DB system architecture is not desirable and efficient • Possible Alternatives: • Express entire combined algorithms in DBMS query language • Exploit DBMS extension mechanism for IR • Flexible and transparent integration of IR in query engine

  6. Multi-model DBMS Architecture Conceptual Layer Logical Layer X-Path LM IR … Physical Layer Suffix Array Staircase-Join …

  7. Cirquid Focus • X-Path extension and IR Language Modeling extension • Suitable for collection-based processing • Maintain data independence • Based on region algebra

  8. 1 <section> <title>InformationRetrievalUsingRDBMS</title> <section> <title>BeyondSimpleTranslation</title> <section> <title>ExtensionofIRFeatures</title> </section> </section> </section> 3 4 5 6 7 2 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 <section> (1, 1:23, 0) (1, 8:22, 1) (1, 14:21, 2) … … Node index <title> (1, 2:7, 1) (1, 9:13, 2) (1, 15:20, 3) … … “information” (1, 3, 2) … … Word index “retrieval” (1, 4, 2) … … Containment, direct containment, tight containment, proximity

  9. XML Indexing A A B C B C D E T1 D E T1 T2 T3 T4 T2 T3 T4

  10. Node index A B C D E T1 T2 T3 T4 Word index

  11. LM IR on Regions • Extend region representation with a probability value • Extend DB with rules how the probabilities are computed • E.g.: P(A ranked_combining B) = count(#B in A) / count( * in A ) • Background model [W3C flawed?] • Prob(A ranked_containing B in collection C)

  12. Issues • Tokenization etc. part of schema?! • ‘Content independence’ through declarative specification of RM? • Define term-prob ::= FOR $n in //* LET $rtf = count($n/text() contains Q), $rlen = count($n/text()) RETURN <p>$rtf/$rlen</p>

  13. More Issues • Adjacency? Proximity? Tag name??? • Region representation? • Pre-post? Stretched pre-post? • Byte offset? • Reduce cost of materialization of results by scanning original collection file? • Allows direct use of suffix array… but is it efficient? For what queries?

  14. System Development Plan • Focus on query plan generation • All the way from conceptual to physical! • Inspiration sources: Moa and RAM • Generate for both MonetDB and ‘normal’ RDBMS; also X-100? • Initial goal • Tijah – be pragmatic, must handle INEX 2003! • Integrate with existing Xquery processor: • Galax Open Source implementation • Investigate also Konstanz system

  15. Galax project, started in 2000 in Bell Labs. http://db.bell-labs.com/galax/ • Implements (most of): • XQuery 1.0 and XPath 2.0 Data Model • XQuery 1.0 and XPath 2.0 Functions and Operators • XQuery 1.0 : An XML Query Language • XML Query Use Cases • XML Schema Part1: Structures & Part2: Datatypes • A Typed Implementation: Static & Dynamic • A functional implementation (O’Caml).

  16. Galax Architecture (+example) EXAMPLE • Use case: Relational • Xquery: Return the item number and the description of all the bicycles. <result> { for $i in $items//item_tuple where contains($i/description, "Bicycle") return <item_tuple> {$i/itemno} {$i/description} </item_tuple> } </result>

  17. Galax Architecture (+example) Parsing Layer XQuery AST XQuery Expression XQuery Parser XML Schema AST XML Schema Description XML Parser XML Document

  18. Galax Architecture (+example) Parsing Layer Mapping Layer XQuery Core Internal Structure XQuery AST XQuery Expression XQuery Mapping to the Core XQueryParser XML Schema AST XQuery Type System Internal Structure XML Schema Description Type System Mapping XML Parser XML Document

  19. Galax Architecture (+example) Parsing Layer Mapping Layer (Static) Evaluation Layer XQuery Core Internal Structure Static Error for non well-typed queries XQuery AST XQuery Expression XQuery Mapping to the Core XQueryParser Static Type Checker XML Schema AST XQuery Type System Internal Structure Type of Query Result XML Schema Description Type System Mapping XML Parser element result { element item_tuple { element itemno {xsd:int}, element description {xsd:string} }* } XML Document

  20. Galax Architecture (+example) Normalized Expression (XQuery Core) element result { for $i in ( glx:distinct-docorder( (let $glx:sequence := (glx:distinct-docorder(($items))) return let $glx:last := (fn:count(($glx:sequence))) return for $glx:dot at $glx:position in ($glx:sequence) return glx:distinct-docorder( (let $glx:sequence := (glx:distinct-docorder( (descendant-or-self::node()))) return let $glx:last := (fn:count(($glx:sequence))) return for $glx:dot at $glx:position in ( $glx:sequence) return child::item_tuple))))) return if (fn:boolean((let $glx:v1 := (fn:data((glx:distinct-docorder((let $glx:sequence := ( glx:distinct-docorder(($i)) …

  21. Algebra • At a logical level, not at the physical. • Use of regular-expression types. • Iteration construct based on the notion of monad. • Notation similar to path navigation in XPath.

  22. Algebra: some operators • Projection: book0 /author • Iteration: for b in bib0/book do book [b/author,b/title] • Selection: where e1 then e2 • Aggregation: avg, count, max, min, sum. • Joins: nested for loops • Structural Recursion:match p case b: … case c: … else …

  23. Some Optimization Rules • Goal: • To eliminate unnecessary FOR or MATCH expressions • Enable other optimizations by reordering or distributing computations. • Some rules: • FOR simplification • For v in () do e () • For v in e do v e • For v in (e1,e2) do e3 (for v in e1 do e3) , ( for v in e2 do e3) • IF simplification cexpr1 := true cexpr2 If cexpr1 then cexpr2 else cexpr3 cexpr1 := false cexpr3 • LET simplification used_count $v Expr2 => 0 Expr2 Let $v := Expr1 return Expr2 used_count $v Expr2 => 1 Expr2 [ Expr1 / $v ]

  24. Galax Architecture (+example) Optimized Normalized Expression (XQuery Core) element result { for $i in (glx:distinct-docorder((let $glx:dot := ($items) return for $glx:dot in (descendant-or-self::node()) return child::item_tuple))) return if ( fn:contains((fn:data((glx:distinct-docorder((let $glx:dot := ($i) return child::description))))),("Bicycle")) ) then ( element item_tuple { glx:distinct-docorder((let $glx:dot := ($i) return child::itemno)), text { "" }, glx:distinct-docorder((let $glx:dot := ($i) return child::description))} ) else (()) }

  25. Galax Architecture (+example) Parsing Layer Mapping Layer (Dynamic) Evaluation Layer XQuery Core Internal Structure Static Error for non well-typed queries XQuery AST XQuery Expression XQuery Mapping to the Core XQueryParser Static Type Checker XML Schema AST XQuery Type System Internal Structure Type of Query Result XML Schema Description Type System Mapping XML Parser Query Processor Validation Data Model Query Result XML Document XML AST XML Data Model Loader XML Parser XML Data Model Internal Structure OUR MAPPING OUR QP

  26. Road Ahead • But… • Goal, again, is to be mainly pragmatic first • Deeper research starts after initial QP generator has been bootstrapped from existing system • Risk: • Too much engineering • Algebra in Galax not suited for optimization

  27. Research issues • Should the ‘semi-structured semi-monad’ algebra be adapted to enable more set-oriented processing? • Gives IR application rise to new physical operators???

More Related