1 / 31

Lecture 14: Overview of Post-Relational Development

Lecture 14: Overview of Post-Relational Development. Oct. 13, 2006 ChengXiang Zhai. New Challenges in Databases. Traditional RDBMS Functions. Traditional Relational Data. Traditional Users. New Data/Info Management Functions?. New Data Type?. New Users?. New Kinds of Data.

Télécharger la présentation

Lecture 14: Overview of Post-Relational Development

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 14: Overview of Post-Relational Development Oct. 13, 2006 ChengXiang Zhai

  2. New Challenges in Databases Traditional RDBMS Functions Traditional Relational Data Traditional Users New Data/Info Management Functions? New Data Type? New Users?

  3. New Kinds of Data Ranking in DB “Schema Lean/Last” (Semi-structured data model) Complex object indexing Stream data Data mining Data integration Internet computing applications • Text data • Multimedia data • Scientific data • Sensor data • Log data • Personal data • Web/Email/Blog • ...

  4. New Users • Everyone?

  5. New Functions New/More general Data Model/Architecture? (Object-Oriented) New Algorithms Adding intelligence to DB • Information integration • Navigation • Ranking • Pattern finding (data mining) • Decision support

  6. New Computing Environment Distributed DB Peer-to-Peer (P2P) DB Mobile DB? • Distributed computing/Networks (Internet) • Mobile devices (cell phones, PDAs)

  7. Web Changes Everything Observations: • Publishing of data is almost free • many are simultaneously producer and consumer • Web is becoming a huge database • of distributed data online (published by everyone) • of autonomous databases online • Trends: • static HTML pages --> dynamic pages presenting DB • HTML --> XML for better describing structured data Slide from Kevin Chang’s presentation

  8. Web Changes Everything What are needed: • Content producers: • tools for building huge data store • Content consumers: • tools for discovering and querying info. on the web Slide from Kevin Chang’s presentation

  9. Database Technology Timeline Simple Data Management Global Enterprise Management Early 80s Late 80s Early - Mid 90s Late 90s - 21st C EarlyRelational Client-server Relational Enterprise -capable Relational Internet Computing Pre- relational Packaged & Vertical Applications Data Warehouse & Hi-end OLTP Simple OLTP Active Database Middleware (messaging, queues, events) Java, CORBA, Web interfaces Scaleable OLTP, parallel query, partitioning, cluster support, row-level locking, high availability Simple transactions, on-line backup & recovery Support for all types of data, extensibility, objects Stored procedures, triggers Slide from Anil Nori’s presentation

  10. Current State of DBMSs • OLTP applications • Large amounts of data • Simple data, simple queries and updates • Update statement from debit/credit transaction:UPDATE accounts SET abalance = abalance + :deltaWHERE aid = :aid; • Typically update intensive • Large number of concurrent users (transactions) • Data warehousing applications • Large amounts of data • Simple data but complex querying • Typically read intensive • Large number of users Slide from Anil Nori’s presentation

  11. Current State of DBMSs • These applications require: • Large users/transactions • High performance • High availability (7x24 operations) • Scalability • High levels of security • Administrative support • Good utilities Slide from Anil Nori’s presentation

  12. Internet Applications: Challenges Transaction Processing Larger User Populations Trained Self-Service Network Systems Gigabytes Terabytes Independent Integrated Systems Management Usage Batch Immediate Simple Intelligent Operations Hours Importance Local Global Business-Critical Useful Data Warehousing Users Analysts Every Employee Size Slide from Anil Nori’s presentation

  13. E-commerce/Apps Information Management APIs Type Proprietary Open Tabular Heterogeneous Applications Delivery Standalone Integrated Generic Personalized Access Read/write Lots of read-only Content Direct Search Internet Applications: Challenges Site Operation Management Low TCO, Mission Critical Availability Occasional 24X7 Slide from Anil Nori’s presentation

  14. Internet Challenges • Availability • Need near 100% availability • Must be easy to manage • Replication, hot standby, foolproof system? • Scalability • Number of users is orders of magnitude higher • Security • Global users • Managing millions of users • Encryption • Performance • Internet user expectations • Speed vs correctness • (e.g. Search engines vs blade/cartridge/extender • Availability vs correctness Slide from Anil Nori’s presentation

  15. Selected Current Topics • Text Database and Information Retrieval • Ranking in Databases • Data Integration • P2P Databases • Data Warehousing & OLAP • Data Mining • Stream Data Processing • Web Services • Semi-Structured Data (XML)

  16. Today’s Topic • Evolution of data models • Object-oriented DBs vs. Object relational DBs • XML “revolution”

  17. Nine Historical Epochs • Hierarchical (IMS): late 1960’s and 1970’s • Network (CODASYL): 1970’s • Relational: 1970’s and early 1980’s • Entity-relationship: 1970’s • Extended relational: early 1980’s • Semantic: late 1970’s and 1980’s • Object-oriented: late 1980’s and early 1990 • Object-relational: late 1980’s and early 1990 • Semi-structured (XML): late 1990’s to present

  18. Pre-Relational Era • IMS (hierarchical data model): Lessons • L1: Physical and logical data independence are highly desirable • L2: Tree structured data models are very restrictive • L3: It is a challenge to provide sophisticated logical reorganization of tree structured data • L4: A record-at-a-time user interface forces the programmer to do manual query optimization, and this if often hard • DODASYL • L5: Networks are more flexible than hierarchies but more complex • L6: Loading and recovering networks is more complex than hierarchies

  19. Relational Era • Resolution of “relational” vs. CODASYL is settled by • The success of the VAX • The non-portability of CODASYL engines • The complexity of IMS logical data bases • Lessons: • L7: Set-a-time languages are good, regardless of the data model, since they offer much improved physical data independence • L8: Logical data independence is easier with a simple data model than with a complex one • L9: Technical debates are usually settled by the elephants of the marketplace, and often for reasons that have little to do with the technology • L10: Query optimizers can beat all the best record-at-a-time DBMS application programmers

  20. The Entity-Relationship Era • Proposed in mid 1970’s by Peter Chen • Never gained acceptance as the underlying data model implemented by a DBMS • No query language? • Over-shadowed by the relational model? • Looked too much like a “cleaned up” version of CODASYL? • But widely successful for DB schema design • DB design using normalization was “dead in the water” • It was straightforward to convert an ER diagram into a set of tables in 3rd normal form • Lessons: • L11: Functional dependencies are too difficult for mere mortals to understand. Another reason for KISS (Keep it simple stupid).

  21. Extended Relational (R++) Era • Beginning in the early 1980’s • A sizeable collection of papers of the following template: • Consider an application , call it X • Try to implement X on a relational DBMS • Show why the queries are difficult or why poor performance is observed • Add a new “feature” to the relational model to correct the problem • Valuable contributions • Set-valued attributes (e.g., available colors of an item) • Aggregation (tuple-reference as a data type, e.g., supply(PT, SR, qty, price), where “PT” and “SR” are pointers to tuples) • Generalization (inheritance) • Lessons: • L12: Unless there is a big performance or functionality advantage, new constructs will go nowhere.

  22. The Semantic Data Model (SDM) Era • Early 1980’s • Motivation: relational data model is “semantically impoverished” (can’t easily express a class of data of interest) • Define more general classes, allowing multiple inheritance • Most SDMs are very complex, and were general paper proposals • Have the same problems as the R++ work

  23. Object-Oriented (OO) Era • Beginning in the mid 1990’s • Motivation: “impedance mismatch” between relational DBs and languages like C++ • DBs have their own naming systems, data type systems, and conventions for returning data as results • Need conversions between DB conventions and programming language conventions • Like “gluing an apple onto a pancake” • As a result, persistent programming language has attracted much attention

  24. Persistent Programming Language • Characteristics • Variables can represent disk-based data as well as main memory data • DB search criteria = language constructs • Early prototypes (late 1970’s): Pascal-R, Rigel, … • Cleaner than SQL embedding • However, compiler must be extended with DBMS-oriented functionality (not very successful) • No technology transfer

  25. Object-Oriented Data Bases • In the mid 1980’s, C++ triggered resurgence of interest in persistent programming languages • Research systems: Garden, Exodus • Startups: Ontologic, Object Design, Versant • General goal: persistent C++ • Extend C++ as a data model • Any C++ structure can be persisted • Support “relationship” • Application/market domain: engineering DBs • Typically, open a large object (e.g., electronic circuit), process it exclusively and close it. • No need for a declarative query language (only need to reference objects) • No fancy transaction management is needed (one-user-at-a-time) • Performance has to be competitive with conventional C++

  26. Current Status of OODB • Market never got very large (too many vendors competing for a “niche” market) • The OODB vendors either have failed or repositioned their companies to offer something else • E.g., Object Design is now Excelon and selling XML services • Reasons for the failure • For their own market: absence of leverage, no standard, relink the world • For competing with Relational DBs: lack of transactions, low-level record-at-a-time (with the exception of O2, which embedded a declarative language, i.e., OQL into a programming language) • Lesson: • L13: Packages will not sell to users unless they are in “major pain”

  27. The Object-Relational Era • Motivated by the need for handling geographic data • Question: How to extend a relational DB to handle new data type? • The object-relational proposal: add the following to SQL (Postgres): • User-defined data types • User-defined operators • User-defined functions, and • User-defined access methods • Commercially successful: • Postgres->Illsutra (acquired by Informix) • Lessons: • L14: The major benefits of OR is two-fold: putting code in the database (thereby blurring the distinction between code and data) and user-defined access methods • L15: Widespread adoption of new technology requires either standards and/or elephant pushing hard

  28. Semi-Structured Data • Motivation: abundance of semi-structured data, exchange format, … • Early system: Lore • Current standards: XMLSchema, XQuery • Two major points • Schema last • Complex network-oriented data model

  29. Schema Last • Application categories • Rigidly structured data • Rigidly structured data with some text fields • Semi-structured data (need to handle semantic heterogeneity) • Text • Very few examples of the 3rd category • The 3rd category can be converted to 1 and 2.

  30. XML Data Model • XML Records can be hierarchical as in IMS • Have “links” as in CODASYL • Have set-based attributes as in SDM • Inherit from other records as in SDM • And others that are known to be hard to implement • Possible scenarios: • XMLSchema will fail • A data-oriented subset of XMLSchema will be proposed • Repeat the “great debate” • Lessons: • L16:Schema-last is probably a niche market • L17: XQuery is pretty much OR SQL with a different syntax • L18: XML will not solve the semantic heterogeneity either inside or outside the enpterprise

  31. What You Should Know • New developments in databases are mostly driven by new applications • The impact of a technology highly depends on the market (the right time, right environment, …) • Cycles of data models (complex->simple->complex…)

More Related