1 / 56

What Goes Around Comes Around

What Goes Around Comes Around. 0256806 楊明智. Author. Michael Stonebraker Joseph M. Hellerstein. Abstract. Summary of 35 years of data model proposals, grouped into 9 different eras. Later proposals inevitably bear a strong resemblance to certain earlier proposals.

bing
Télécharger la présentation

What Goes Around Comes Around

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What Goes Around Comes Around 0256806 楊明智

  2. Author • Michael Stonebraker • Joseph M. Hellerstein

  3. Abstract • Summary of 35 years of data model proposals, grouped into 9 different eras. • Later proposals inevitably bear a strong resemblance to certain earlier proposals. ->What Goes Around Comes Around • Most current researchers have limited (if any) understanding of what was previously learned.

  4. Abstract • Hierarchical (IMS): late 1960’s and 1970’s • Network (CODASYL): 1970’s • Relational: 1970’s and early 1980’s • Entity-Relationship: 1970’s • Extended Relational: 1980’s • Semantic: late 1970’s and 1980’s • Object-oriented: late 1980’s and early 1990’s • Object-relational: late 1980’s and early 1990’s • Semi-structured (XML): late 1990’s to the present

  5. Sample data schema • Supplier (sno, sname, scity, sstate) • Part (pno, pname, psize, pcolor) • Supply (sno, pno, qty, price)

  6. Hierarchical (IMS): late 1960’s and 1970’s • Network (CODASYL): 1970’s • Relational: 1970’s and early 1980’s • Entity-Relationship: 1970’s • Extended Relational: 1980’s • Semantic: late 1970’s and 1980’s • Object-oriented: late 1980’s and early 1990’s • Object-relational: late 1980’s and early 1990’s • Semi-structured (XML): late 1990’s to the present

  7. Hierarchical (IMS): • Record type: a collection of named fields with their associated data types. • The record types must be arranged in a tree. • Each instance, other than root instances, has a single parent of the correct record type.

  8. Hierarchical (IMS): • IMS data base is a collection of instances. Every record in an IMS data base has a hierarchical sequence key(HSK). • HSK is derived by concatenating the keys of ancestor records, and then adding the key of the current record.

  9. Hierarchical (IMS): • It facilitates a simple data manipulationlanguage, DL/1. • DL/1 is a “record-at-a-time” language.

  10. Hierarchical (IMS): • Lesson 1: Physical and logical data independence are highly desirable. • Lesson 2: Tree structured data models are very restrictive.

  11. Hierarchical (IMS): • Lesson 3: It is a challenge to provide sophisticated logical reorganizations of tree structured data. • Lesson 4: A record-at-a-time user interface forces the programmer to do manual queryoptimization, and this is often hard.

  12. Hierarchical (IMS): late 1960’s and 1970’s • Network (CODASYL): 1970’s • Relational: 1970’s and early 1980’s • Entity-Relationship: 1970’s • Extended Relational: 1980’s • Semantic: late 1970’s and 1980’s • Object-oriented: late 1980’s and early 1990’s • Object-relational: late 1980’s and early 1990’s • Semi-structured (XML): late 1990’s to the present

  13. Network (CODASYL) • Conference on Data System Languages  • This model organized a collection of record types, each with keys, into a network rather than a tree. • A given record instance could have multiple parents, rather than a single one, as in IMS.

  14. Network (CODASYL) • A named arc is called a set

  15. Network (CODASYL) • All the data was typically in one large network. • This much larger object had to be bulk-loaded all at once. • Crash recovery tended to be more involved. • program tended to be complex, and this usually entailed many disk seeks.

  16. Network (CODASYL) • Lesson 5: Networks are more flexible than hierarchies but more complex. • Lesson 6: Loading and recovering networks is more complex than hierarchies

  17. Hierarchical (IMS): late 1960’s and 1970’s • Network (CODASYL): 1970’s • Relational: 1970’s and early 1980’s • Entity-Relationship: 1970’s • Extended Relational: 1980’s • Semantic: late 1970’s and 1980’s • Object-oriented: late 1980’s and early 1990’s • Object-relational: late 1980’s and early 1990’s • Semi-structured (XML): late 1990’s to the present

  18. Relational • When logical or physical changes occurred, programmers were spending large amounts of time doing maintenance on IMS applications. • Ted Codd was focused on providing better data independence.

  19. Relational • His early DML proposals were the relational calculus and the relational algebra. ->Set-at-a-time languages. ->Codd was originally a mathematician.

  20. Relational • Lesson 7: Set-a-time languages are good, since they offer much improved physical data independence. • Lesson 8: Logical data independence is easier with a simple data model than with a complex one.

  21. Relational • Lesson 9: Technical debates are usually settled by the elephants of the marketplace, and often for reasons that have little to do with the technology. ->IBM.DB2 • Lesson 10: Query optimizers can beat all but the best record-at-a-time DBMS application programmers.

  22. Hierarchical (IMS): late 1960’s and 1970’s • Network (CODASYL): 1970’s • Relational: 1970’s and early 1980’s • Entity-Relationship: 1970’s • Extended Relational: 1980’s • Semantic: late 1970’s and 1980’s • Object-oriented: late 1980’s and early 1990’s • Object-relational: late 1980’s and early 1990’s • Semi-structured (XML): late 1990’s to the present

  23. Entity-Relationship • Data base be thought of a collection of instances of entities. • Entities have attributes. • there could be relationships between entities.

  24. Entity-Relationship • E-R model has been wildly successful, namely in data base (schema) design. ->normalization theory • It was straightforward to convert an E-R diagram into a collection of tables in third normal form.

  25. Entity-Relationship • DBAs immediately asked 「How do I get an initial set of tables?」 • Normalization theory was based on the concept of FDs, and real world DBAs could not understand this construct. ->「dead in the water」

  26. Entity-Relationship • Lesson 11: Functional dependencies are too difficult for mere mortals to understand. • Another reason: for KISS (Keep it simple stupid).

  27. Hierarchical (IMS): late 1960’s and 1970’s • Network (CODASYL): 1970’s • Relational: 1970’s and early 1980’s • Entity-Relationship: 1970’s • Extended Relational: 1980’s • Semantic: late 1970’s and 1980’s • Object-oriented: late 1980’s and early 1990’s • Object-relational: late 1980’s and early 1990’s • Semi-structured (XML): late 1990’s to the present

  28. Extended Relational • 80’s, many applications were investigated including mechanical CAD, VLSI CAD. • Data type may be Text management,time and computer graphics. • the queries are difficult and poor performance.

  29. Extended Relational • set-valued attributes: add a data type to the relational model to deal with sets of values. • Aggregation: tuple-reference as a data type -> foreign keys EX:data type PT is “tuple in the Part table” SR is “tuple in the Supplier table”

  30. Extended Relational • Generalization: Each specialization inherits all of the data attributes in its ancestors.

  31. Extended Relational • primary-key-foreign-key relationships easily simulate tuple as a data type. ->Very little performance improvement. • Relational vendors were singularly focused on improving transaction performance. • Little technology transfer of Extended Relational ideas into the commercial world.

  32. Extended Relational • Lesson 12: Unless there is a big performance or functionality advantage, new constructs will go nowhere.

  33. Hierarchical (IMS): late 1960’s and 1970’s • Network (CODASYL): 1970’s • Relational: 1970’s and early 1980’s • Entity-Relationship: 1970’s • Extended Relational: 1980’s • Semantic: late 1970’s and 1980’s • Object-oriented: late 1980’s and early 1990’s • Object-relational: late 1980’s and early 1990’s • Semi-structured (XML): late 1990’s to the present

  34. Semantic • relational data model is incapable of easily expressing a class of data. ->「semantically impoverished」 • focuses on the notion of classes and multiple inheritance.

  35. Semantic • the same two problems that faced the Extended Relational advocates. ->It’s easy to simulate on relational systems. ->The established vendors were distracted with performance.

  36. Hierarchical (IMS): late 1960’s and 1970’s • Network (CODASYL): 1970’s • Relational: 1970’s and early 1980’s • Entity-Relationship: 1970’s • Extended Relational: 1980’s • Semantic: late 1970’s and 1980’s • Object-oriented: late 1980’s and early 1990’s • Object-relational: late 1980’s and early 1990’s • Semi-structured (XML): late 1990’s to the present

  37. Object-oriented • Relational data bases、programming language had their own systems. • To bind an application to the data base required a conversion from “programming language speak” to “data base speak”and back.

  38. Object-oriented • Iintegrate DBMS functionality more closely into a programming language. • persistent programming language.

  39. Object-oriented • Requires the compiler for the programming language to be extended with DBMS-oriented functionality. • Extension must be done once per complier. • Usually difficult to program and error prone.

  40. Object-oriented • No standards: All of the OODB vendor offerings were incompatible. • No programming language Esperanto: If your enterprise had a single application not written in same language, then you could not use one of the OODB products.

  41. Object-oriented • Lesson 13: Packages will not sell to users unless they are in “major pain” • Lesson 14: Persistent languages will go nowhere without the support of the programming language community.

  42. Hierarchical (IMS): late 1960’s and 1970’s • Network (CODASYL): 1970’s • Relational: 1970’s and early 1980’s • Entity-Relationship: 1970’s • Extended Relational: 1980’s • Semantic: late 1970’s and 1980’s • Object-oriented: late 1980’s and early 1990’s • Object-relational: late 1980’s and early 1990’s • Semi-structured (XML): late 1990’s to the present

  43. Object-relational • Geographic information systems (GIS) store the location of a collection of intersections as: • One-dimensional access methods do not do two- dimensional searches efficiently, so there is no way in a relational system for this query to run fast.

  44. Object-relational • user-defined data types,operators,functions, and access methods. • !! (point in rectangle) • ## (box intersects box)

  45. Object-relational • user-defined data functions(UDF) ->stored procedures

  46. Object-relational • Absence of standards. Every vendor has his own way of defining and calling UDFs. ->most vendors support Java UDFs, but Microsoft does not.

  47. Object-relational • Lesson 15: The major benefits of OR is two-fold: putting code in the data base (and thereby bluring the distinction between code and data) and user-defined access methods. • Lesson 16: Widespread adoption of new technology requires either standards and/or an elephant pushing hard.

  48. Hierarchical (IMS): late 1960’s and 1970’s • Network (CODASYL): 1970’s • Relational: 1970’s and early 1980’s • Entity-Relationship: 1970’s • Extended Relational: 1980’s • Semantic: late 1970’s and 1980’s • Object-oriented: late 1980’s and early 1990’s • Object-relational: late 1980’s and early 1990’s • Semi-structured (XML): late 1990’s to the present

  49. Semi-structured (XML) • Semantic heterogeneity: Information on a common object does not conform to a common representation. • Makes query processing a big challenge. e.g. passtimes and hobbies,Works_for

  50. Semi-structured (XML) • Schema-last it is natural for users to enter their data as free text, perhaps through a word processor (which may annotate the text with some simple metadata about document structure). • It is difficult to think up very many examples, other than resumes and advertisements.

More Related