210 likes | 358 Vues
Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary , 2011 . Another one of these No-SQL talks ?.
E N D
Trees, semistructured data,and other strange ways to go beyond tablesSerge Abiteboul INRIA & ENS CachanPODS 30th Anniversary, 2011 Another one of these No-SQL talks? IMS, hierarchical model, V-relations, Jacobs’s calculus, Hardgrave’s broom, nested relations, format model, complex objects, logical data model, object databases, lambda calculus, regular trees, F-logic, NF1F, NF2, COL, IFO, LDL, IQL, SGML, HTML, ASN.1, XML, YAML, JSON… Luc Véro
Theorem: Information lives in trees and not in relations Proof: the Bible does not say « But of the two dimensional table of knowledge of good and evil … » Introduction Trees are useless n Knowledge lives in trees But of the tree of the knowledge of good and evil, thou shalt not eat of it: for in the day that thou eatest thereof thou shalt surely die.Genesis, 2. 17 • A tree is a tree. How many more do you have to look at? • Ronald Reagan, governor of California, opposing the expansion of Redwood National Park (1966) • We don’t need anything beyond relations. These things are useless. Reject! • Anonymous referee (circa 1990)
Organization • Introduction • Hierarchical data model 60s • Nested relations 80s • Complex objects early 90s • Semistructured data & unranked labeled trees late 90s • Unranked labeled ordered trees, aka XML early 00s • Evolving trees, aka Active XML mid 00s • Cycles 90s to now • Conclusion More or less chronological
For lack of time, we will ignore IMS and the hierarchical model • The language was purely navigational anyway • We will also ignore early works such as Makinouchi, Jacobs or Hardgrave • We will start with N1NF • François Bancilhon in France • Hans Schek in Germany • PhD thesis of Nicole Bidoit
Non-First-Normal-Form N1NF A quarter on tables. Now what? Data live in 1NF relations Data would prefer to live in infamous nested relations aka V-relations aka N1NF relations aka NF2 relations Trees! DB101
The devil is in the details V-relations N1NF-relations A is not a key The size is now possibly exponential in the size of the domain A is a key No new power
Complex object model tuple and set constructors used freely * * * * * Families Children Children Cars Cars Name Peter Name Peter Name Mimi Sex F Year 1976 Year 2010 Name Toto Name 2CV Name BMW Sex M Name Zaza Sex F
A logic and algebra for complex objects • Logic: main novelty is set variables – non first-order • Example: AbouBanat Query • { T.Father| Families(T) X T.Children ( X.Sex = F ) } • Algebra: powerset operation, unnest/nest
Results • Equivalence theorem: algebra and logic have same expressive power • Remark: one can compute TC using algebra/logic (waoh! Cool!) • Also studied: fixpoint, datalog, while… • Complexity: each new level of nesting introduces one more exponential • Need to control the use of powerset 2n 2n 2 ….
From complex objects to semistructured data * * * * * Families Children Children Cars Cars Name Peter Name Peter Name Mimi Sex F Year 1976 Year 2010 Name Toto Name 2CV Name BMW Sex M Name Zaza Sex F
Revolution 1: more flexibility * * * * * Families Children Children Cars Cars Name Peter Name Peter Name Mimi Sex F Year 1976 Year 2010 Name Toto Name 2CV Name BMW Sex M Name Zaza Sex F Annotations Trash
Revolution 2: Remove some nodes; name all * * * * Families Family Family Children Cars Cars Name Peter Name Peter Child Child Car Car Year 1976 Year 2010 Name Toto Name 2CV Name BMW Sex M Name Zaza Ann. Trash Sex F
Unranked label trees Families Family Family Children Cars Cars Name Peter Name Peter Child Child Car Car Year 1976 Year 2010 Name Toto Name 2CV Name BMW Sex M Name Zaza Ann. Trash Sex F
This is better adapted to a Web context • Self describing data: No separation between schema and data • Flexibility • Not such a big deal • May be the main contribution is the format? • <families><family><name>Peter</Name><Cars><Car><Name>BMW</Name><Year>2010</Year></Car></Cars><Children><Child> … Plus ça change, plus c’est la même chose The more things change, the more theystay the same
What else? The trees are unbounded a r a $ a a a a a a a a a a $ a b a b b • Like nested relations, trees are unbounded in width • Unlike nested relations, they are unbounded in depth • One can simulate 2 counter machines with 2 branches • Do applications simulate 2 counter machines with XML documents? • I am still looking for one • XML documents are rarely deep • But even for bounded trees there are fun questions: e.g., is the equivalence of monadic datalog decidable for bounded data trees
What else? the trees are orderedUnranked labeled ordered trees = XML • Ignore order • Classical optimization • Respect order • Totally new ball game • Bring in tree automata Order is often painful for optimization Reconcile
Selling argument is the Web… • The move from relations to trees is interesting • But the move from centralized to distributed as well • and much less investigated • Where the fun is: • Scale is beyond what we though was thinkable • Machines are totally autonomous • Schema replaced by numerous ontologies • True/false logic replaced by inconsistency, probabilities, trust, belief…
And the trees are evolving (aka Active XML) • An old idea from object databases: mix data and computation Resorts Resort snowcond hotels State Colorado Name Aspen snow !Yahoo.com/GetHotels <city name=“Aspen”/>) Unit Depth Meter 1 !Unisys.com/snow (“Aspen”)
And there are cycles Person Name Spouse • For lack of time, I will not mention the network model [Codasyl 1969] • The language was purely navigational anyway • If I would add references to XML, I’d get cycles • Lots of models for graph data, e.g., IQL • Some fun results: e.g., some copy elimination problem when trying to obtain a ChandraHarel completeness for IQL • Similar issue for unordered trees [recent result with Vianu] Adam Person Name Spouse Eve Paris C. Kanellakis
Conclusion • Is this a good time to do research on trees in databases? • The best time to plant a tree was 20 years ago. • The next best time is now. • Chinese Proverb
AdvertisementBook on Web data management to appear at Cambridge University Presshttp://webdam.inria.fr/Jorge