430 likes | 435 Vues
A Sketch of Regres. Mike Carey Joey Hellerstein Michael Stonebraker. Outline. Why we need to rethink everything! All current DBMSs architected in the late 1970s why the world is different now A sketch of Regres a new data base architecture for the millenium. The World is Different.
E N D
A Sketch of Regres Mike Carey Joey Hellerstein Michael Stonebraker
Outline • Why we need to rethink everything! • All current DBMSs architected in the late 1970s • why the world is different now • A sketch of Regres • a new data base architecture for the millenium
The World is Different • CPU, memory, disk up by 10 ** 6 in the last 20 years • Design point of 1 Tbyte buffer pool in 2005, up from 1 Mbyte in the 1970s • It will NOT be 250 million 4K pages! Need to rethink storage architectures!
The World is Different • Most serious applications use a TP monitor • I.e. a three tier application architecture • data at the bottom in a DBMS • code in middle tier in TP monitor • user interface on the client
Why? • 2 tier doesn’t scale and is too hard to manage • DBMS couldn’t execute code Probably undesirable to decompose function this way! Want code “near” the data it accesses Need to rethink application architecture!
The World is Different • 7X 24 a serious requirement most everywhere! • End-to-end issue • RAID not the (complete) answer • Require wide area network replication Need to design in, not bolt on this capability!
The World is Different • The web changes everything • not a client-server protocol! • stateless • requirement to deal with HTML, XML, ... Need to have web-centric architecture!
The World is Different • ERP and web applications require scalability, unheard of previously • 100,000 ERP seats not uncommon • E-commerce on the web will entail huge transactions rates Need to think at these levels!
The World is Different • Warehousing is a new application area • typical app is data mining • queries run forever Need to design in, not bolt on sampling!
The World is Different • Multiprocessor architectures common • Clusters are here • NUMA is here • MPP is here Need to design in, not bolt on load balance!
The World is Different • The gizmo revolution is coming • mobile clients • disconnected operation Need to design in not bolt on disconnected operation!
The World is Different • The gizmo revolution is coming • small footprint servers (coke machine as a data base) Need to scale down as well as up in one system!
The World is Different • SQL-3 is here • components (blades, extenders, OLE, Corba) in the data base • multiple language support required • inheritance required Need to design in, not bolt on method support in a variety of component models!
And…. • DBMSs are currently “bloated” • stored procedures • object-relational features • warehouse features • triggers • standard benchmark hacks • Users have a low tolerance for errors Debugging the next release is getting hard!
Conclusion • Need to rethink DBMS architecture from the ground up • This comment also applies to operating systems • and probably to networks
The Result -- Regres • A mix of some discarded ideas (whose time has re-come) • And some new ideas
Assumptions -- Must Design for a Data and Machine Federation • 7 X 24 operation requires wide area replication • understood by the DBMS • transactionally consistent • fastest mechanism is to move the log Argues for Federated DBMS!
Assumptions -- Must Design for a Data and Machine Federation • Integrating code and data on multiple machines is a better idea than TP monitors • data and code on each machine in a network! Requires a Federated DBMS!
Assumptions -- Must Design for a Data and Machine Federation • Incredible scalability requires more than the biggest single system Federated DBMS a good model!
Advantages of a Federated DBMS • Mimics the enterprise, which is distributed • Naturally supports mergers • Allows “jelly bean” hardware components • Can be incrementally built and extended
Assumptions -- Semantic Heterogeneity a Must! • No systems to be federated have a common schema • salary in US is gross dollars • salary in France is net francs with a lunch allowance Must deal with this!
Assumptions -- Local Autonomy a Must! • Few systems to be federated are in the same “administrative moat”! • Must allow local DBAs to control their own destiny!
Traditional Distributed DBMSs (and all commercial systems) • Do neither of these • Are a non-starter for a future architecture Cannot have a traditional query optimizer!
Mariposa (and Cohera) made a good start • Economic paradigm for federated query processing • each query has a budget • each site is an independent contractor • federator acts like a general contractor, trying to solve query under the budget Agoric systems are starting to get traction!
Mariposa (and Cohera) made a good start • Flexible heterogeneous replication • master-slave or peer-to-peer • bounded out-of-date-ness • Mobile (and disconnected) sites ok • out-of-date replica
Mariposa (and Cohera) Data Model • A collection of fragments of a SQL-3 table • range partitioning • type conversion of data types when federated • Each “owned” by a local DBA
But there is much room for improvement! • Query decomposition into economic units of work • bottom-up • top down • heuristic decomposition
But there is much room for improvement! • Change the economic plan midflight if circumstances change • how to tell things have changed • what to do
But there is much room for improvement! • Partial answers are often a good idea • how to integrate Control ideas into an agoric system • can it be done without knowing how much of the answer the user will want?
But there is much room for improvement! • Future data will be imprecise • imagine federating Michelin and Fodors restaurant guide • Query processing must become evidence accumulation • built-in not bolted on • model of “likely sites” required
Local DBMS -- Storage Model • Store segments • I.e. the unit of federation • Also the unit of movement between disk and cache (segmented storage) • Need “split” and “coalesce” to keep variable length segments reasonably sized Shades of the Burroughs B5000!
Storage Model -- Open Issues • When to coalesce and split segments • LRU a bad model for eviction
Local DBMS -- System Services • DBMS provides buffer pool, file system • Can provide file system abstraction easily • Thread management from compiler • Reliable message delivery from network • DBMS is only application running on the machine • no need for a scheduler Very thin OS will do…..
Local DBMS -- No Knobs • Current DBMSs are WAY too hard to use • Not enough talented DBAs to go around • Tuning typically done by vendor’s SE Want to have NO tuning knobs! Only control: go/stop Not clear how to do this!
Protocol • Federation components must communicate with an asynchronous (stateless) protocol • Design challenge for a world where sessions are the norm
Local DBMS -- Attacking Bloat • Basic Problem -- two data representations • the log • the data in the data base • Consistency of these representations on crashes drives a lot of complexity
Idea Number One • One representation -- no log • “No overwrite” versioning storage system (like POSTGRES) for undo • Wide area replication for recovery
Issue • POSTGRES storage system required 4 writes to commit a transaction • too slow to be interesting in OLTP • Can we design a “no overwrite” storage system with high performance?
Idea Number Two • Log is the only storage system • When data is brought into main memory, it is “swizzled” into a high performance format • and “unswizzled” on cache eviction
Issue • Can cache residency be made long enough to justify the overhead? • Will “cold data” performance be unacceptably bad?
Semantic Heterogeneity • Lots of approaches • code (Mariposa, Cohera) • Rules (Mergent) • Prolog • Lots of past work • e.g. Multibase Space well picked over!
Regres Focus • Regres must be repository-based • Regres must provide yellow pages for economic model • Regres must provide “schema discovery” tools Focus on the repository and building semantic heterogeneity support into it
Summary • Thin local system; fat Federator • Lots of interesting design challenges • Focus of DBMS seminar this semester