1 / 43

A Sketch of Regres

A Sketch of Regres. Mike Carey Joey Hellerstein Michael Stonebraker. Outline. Why we need to rethink everything! All current DBMSs architected in the late 1970s why the world is different now A sketch of Regres a new data base architecture for the millenium. The World is Different.

geoffreyd
Télécharger la présentation

A Sketch of Regres

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Sketch of Regres Mike Carey Joey Hellerstein Michael Stonebraker

  2. Outline • Why we need to rethink everything! • All current DBMSs architected in the late 1970s • why the world is different now • A sketch of Regres • a new data base architecture for the millenium

  3. The World is Different • CPU, memory, disk up by 10 ** 6 in the last 20 years • Design point of 1 Tbyte buffer pool in 2005, up from 1 Mbyte in the 1970s • It will NOT be 250 million 4K pages! Need to rethink storage architectures!

  4. The World is Different • Most serious applications use a TP monitor • I.e. a three tier application architecture • data at the bottom in a DBMS • code in middle tier in TP monitor • user interface on the client

  5. Why? • 2 tier doesn’t scale and is too hard to manage • DBMS couldn’t execute code Probably undesirable to decompose function this way! Want code “near” the data it accesses Need to rethink application architecture!

  6. The World is Different • 7X 24 a serious requirement most everywhere! • End-to-end issue • RAID not the (complete) answer • Require wide area network replication Need to design in, not bolt on this capability!

  7. The World is Different • The web changes everything • not a client-server protocol! • stateless • requirement to deal with HTML, XML, ... Need to have web-centric architecture!

  8. The World is Different • ERP and web applications require scalability, unheard of previously • 100,000 ERP seats not uncommon • E-commerce on the web will entail huge transactions rates Need to think at these levels!

  9. The World is Different • Warehousing is a new application area • typical app is data mining • queries run forever Need to design in, not bolt on sampling!

  10. The World is Different • Multiprocessor architectures common • Clusters are here • NUMA is here • MPP is here Need to design in, not bolt on load balance!

  11. The World is Different • The gizmo revolution is coming • mobile clients • disconnected operation Need to design in not bolt on disconnected operation!

  12. The World is Different • The gizmo revolution is coming • small footprint servers (coke machine as a data base) Need to scale down as well as up in one system!

  13. The World is Different • SQL-3 is here • components (blades, extenders, OLE, Corba) in the data base • multiple language support required • inheritance required Need to design in, not bolt on method support in a variety of component models!

  14. And…. • DBMSs are currently “bloated” • stored procedures • object-relational features • warehouse features • triggers • standard benchmark hacks • Users have a low tolerance for errors Debugging the next release is getting hard!

  15. Conclusion • Need to rethink DBMS architecture from the ground up • This comment also applies to operating systems • and probably to networks

  16. The Result -- Regres • A mix of some discarded ideas (whose time has re-come) • And some new ideas

  17. Assumptions -- Must Design for a Data and Machine Federation • 7 X 24 operation requires wide area replication • understood by the DBMS • transactionally consistent • fastest mechanism is to move the log Argues for Federated DBMS!

  18. Assumptions -- Must Design for a Data and Machine Federation • Integrating code and data on multiple machines is a better idea than TP monitors • data and code on each machine in a network! Requires a Federated DBMS!

  19. Assumptions -- Must Design for a Data and Machine Federation • Incredible scalability requires more than the biggest single system Federated DBMS a good model!

  20. Advantages of a Federated DBMS • Mimics the enterprise, which is distributed • Naturally supports mergers • Allows “jelly bean” hardware components • Can be incrementally built and extended

  21. Assumptions -- Semantic Heterogeneity a Must! • No systems to be federated have a common schema • salary in US is gross dollars • salary in France is net francs with a lunch allowance Must deal with this!

  22. Assumptions -- Local Autonomy a Must! • Few systems to be federated are in the same “administrative moat”! • Must allow local DBAs to control their own destiny!

  23. Traditional Distributed DBMSs (and all commercial systems) • Do neither of these • Are a non-starter for a future architecture Cannot have a traditional query optimizer!

  24. Mariposa (and Cohera) made a good start • Economic paradigm for federated query processing • each query has a budget • each site is an independent contractor • federator acts like a general contractor, trying to solve query under the budget Agoric systems are starting to get traction!

  25. Mariposa (and Cohera) made a good start • Flexible heterogeneous replication • master-slave or peer-to-peer • bounded out-of-date-ness • Mobile (and disconnected) sites ok • out-of-date replica

  26. Mariposa (and Cohera) Data Model • A collection of fragments of a SQL-3 table • range partitioning • type conversion of data types when federated • Each “owned” by a local DBA

  27. But there is much room for improvement! • Query decomposition into economic units of work • bottom-up • top down • heuristic decomposition

  28. But there is much room for improvement! • Change the economic plan midflight if circumstances change • how to tell things have changed • what to do

  29. But there is much room for improvement! • Partial answers are often a good idea • how to integrate Control ideas into an agoric system • can it be done without knowing how much of the answer the user will want?

  30. But there is much room for improvement! • Future data will be imprecise • imagine federating Michelin and Fodors restaurant guide • Query processing must become evidence accumulation • built-in not bolted on • model of “likely sites” required

  31. Local DBMS -- Storage Model • Store segments • I.e. the unit of federation • Also the unit of movement between disk and cache (segmented storage) • Need “split” and “coalesce” to keep variable length segments reasonably sized Shades of the Burroughs B5000!

  32. Storage Model -- Open Issues • When to coalesce and split segments • LRU a bad model for eviction

  33. Local DBMS -- System Services • DBMS provides buffer pool, file system • Can provide file system abstraction easily • Thread management from compiler • Reliable message delivery from network • DBMS is only application running on the machine • no need for a scheduler Very thin OS will do…..

  34. Local DBMS -- No Knobs • Current DBMSs are WAY too hard to use • Not enough talented DBAs to go around • Tuning typically done by vendor’s SE Want to have NO tuning knobs! Only control: go/stop Not clear how to do this!

  35. Protocol • Federation components must communicate with an asynchronous (stateless) protocol • Design challenge for a world where sessions are the norm

  36. Local DBMS -- Attacking Bloat • Basic Problem -- two data representations • the log • the data in the data base • Consistency of these representations on crashes drives a lot of complexity

  37. Idea Number One • One representation -- no log • “No overwrite” versioning storage system (like POSTGRES) for undo • Wide area replication for recovery

  38. Issue • POSTGRES storage system required 4 writes to commit a transaction • too slow to be interesting in OLTP • Can we design a “no overwrite” storage system with high performance?

  39. Idea Number Two • Log is the only storage system • When data is brought into main memory, it is “swizzled” into a high performance format • and “unswizzled” on cache eviction

  40. Issue • Can cache residency be made long enough to justify the overhead? • Will “cold data” performance be unacceptably bad?

  41. Semantic Heterogeneity • Lots of approaches • code (Mariposa, Cohera) • Rules (Mergent) • Prolog • Lots of past work • e.g. Multibase Space well picked over!

  42. Regres Focus • Regres must be repository-based • Regres must provide yellow pages for economic model • Regres must provide “schema discovery” tools Focus on the repository and building semantic heterogeneity support into it

  43. Summary • Thin local system; fat Federator • Lots of interesting design challenges • Focus of DBMS seminar this semester

More Related