1 / 21

C-Store: An Introduction to Berkeley DB

C-Store: An Introduction to Berkeley DB. Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009. Overview of Berkeley DB. Means the Berkeley Database An open-source , embedded transactional data management system A key/value store Embedded ?

elom
Télécharger la présentation

C-Store: An Introduction to Berkeley DB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. C-Store: An Introduction to Berkeley DB Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar. 13, 2009

  2. Overview of Berkeley DB • Means the Berkeley Database • An open-source, embeddedtransactional data management system • A key/value store • Embedded ? • As a library that is linked with an application • Hides data management from end-user • Scales from Bytes to Petabytes • Runs on everything from cell phone to large servers.

  3. Berkeley DB : Examples of Applications • Google Accounts • Store all user and service account information and preferences. • Amazon’s user-customization • Berkeley DB has high reliability and high performance.

  4. Berkeley DB: A Brief History (1) • Began life in 1991 as a dynamic linear hashing implementation. • historic UNIX database libraries: dbm, ndbm and hsearch • Released as a library in the 4.4 BSD in 1992. • db-1.85 == Hash + B-Tree • The package LIBTP • Transactional Implementation of db-1.85 • A research prototype that was never released.

  5. Berkeley DB: A Brief History (2) • In 1996, Seltzer and Bostic started Sleepycat Software. • for use in the Netscape browser • Berkeley DB 2.0, Released in 1997 • Transactional implementation • the first commercial release • Berkeley DB 3.0, Released in 1999 • Transformed into an Object-Oriented Handle and Method style API.

  6. Berkeley DB: A Brief History (3) • Berkeley DB 4.0, Released in 1999 • Single-Master, Multiple-Reader Replication • High Availability • replicas can take over for a failed master • High Scalability • Read-only replicas can reduce master load • Similar ideas are adopted in C-Store. • In Feb. 2006, Oracle acquired Sleepycat.

  7. Sleepycat Public License: a Dual License • The code • Is open source • And may be downloaded and used freely • However, redistribution requires • Either the package using Berkeley DB be released as open source • Or that the distributors obtain a commercial license from Sleepycat (and now Oracle, acquired in Feb. 2006).

  8. Berkeley DB: Product Family Today • The original Berkeley DB library • Berkeley DB XML • Atop the library • Berkeley DB Java Edition • 100% pure Java implementation

  9. Berkeley DB : Product Family Architecture

  10. Berkeley DB: The Design Philosophy • Provide mechanisms without specifying policies • For example, Berkeley DB is abstracted as a store of <key, value> pairs. • Both keys and values are opaque byte-strings. • i.e., Berkeley DB has no schema, • And the application that embeds Berkeley DB is responsible for imposing its own schema on the data.

  11. Advantages of <key, value> pairs • An application is free to store data in whatever form is most natural to it. • Objects (like structures in C language) • Rows in Oracle, SQL Server • Columns in C-store • Different data formats can be stored in the same databases. • As long as the application understands how to interpret the data items.

  12. Indexing Key Values • Indexing methods • B-Tree • Hash • Queue • A record-number-based index implemented atop B-Tree • Data manipulation • Put, store key/value pairs • Get, retrieve key/value pairs • Delete, remove key/value pairs

  13. How Applications Access key/value pairs? • Through handles on databases • Similar to relational tables • Or through cursor handles • Representing a specific place within a database • Used for iteration, i.e., fetch a key/value pair each time. • Databases are implemented atop OS file system. • A file may contain one or more databases.

  14. Berkeley DB Replication:A Log-Shipping System • A Replication Group • A single Master • One or more Read-Only Replicas. • All write operations must be processed transactionally by the Master • The Master sends log records to each of the Replicas. • The Replicas apply log records only when they receive a transaction commit record.

  15. Berkeley DB: Configuration Flexibility • Configuration flexibility is critical • Due to a wide range of applications • Three ways • Compile Time Configuration • Feature Set Selection • Runtime Configuration

  16. Compile Time Configuration • Option 1: small footprint build • -enable-smallbuild • For use in a cell phone • The compiled library contains only B-Tree index, • Omits replication, cryptography, statistics collection, etc. The library is about 0.5 MB. • Option 2: higher concurrency locking • -enable-fine-grained-lock-manager • For use in a Data Center • Lock-Based Concurrency Control

  17. Feature Set Selection • The Data Store (DS) feature set • Most similar to the original db-1.85 library • Good for temporary data storage • The Concurrent Data Store (CDS) feature set • Acquires a single lock per API invocation • Good for Read-Most applications • The Transactional Data Store (TDS) feature set • Currently the most widely used feature set • Acquires a single lock per page • The High Availability (HA) feature set • Can continue running even after a site fails.

  18. Runtime Configuration • Index Selection and Tuning • Applications can select the page size in an index • Trading off Durability and Performance • No-force log write • Extreme case: applications can run completely in memory • Trading off Two-Phase Locking and Multiversion Concurrency Control. • Note: C-Store adopts similar ideas for high performance.

  19. Challenges of Berkeley DB’s Flexibility • Need flexibility in Berkeley DB designers • Need flexibility in application developers

  20. Any Dream? Any Idea? • iGoogle中国大学生创新设计大赛 • 中山大学软件学院第四届软件创新设计大赛 • Some Research with Me?

  21. References • M Seltzer . Berkeley DB: A Retrospective. IEEE Data Engineering Bulletin, Pp. 21-28, Volume 30, Number 3, September 2007 • MA Olson, K Bostic, M Seltzer . Berkeley DB. USENIX Annual Technical Conference, Pp. 183–192, June 6-11, 1999, Monterey, California, USA. • Oracle Berkeley DB Site. http://www.oracle.com/technology/products/berkeley-db

More Related