Main memory DB PDT Ján GENČI
Obsah • Motivation • DRDBMS • MMDBMS • DRDBMS versus MMDBMS • Commit processing • Support in commercial systems
Motivation • For a long time, hard disks were the only technology thatcould store enough information to hold a database andoffered random access at the same time. • Therefore, conventionaldatabase management systems were tuned totake this technology to the maximum. • But in recent years,main memories have become cheaper and grown to a pointthat for some fields of application it allows one to keep thewhole information of a database in main memory andtherefore speed up operation.
Disk based DB • Disk storage is persistent • Disk accesses exhibit a high fixed cost per access due toseek time. • To achieve good performance, accessesshould be: • sequential • transfer large amounts ofneeded data
Main memory based DB • The access time of main memory is of orders ofmagnitude smaller than for disk storage, • The access time of main memory is less dependent on thelocation(fragmentation)
DRDBMS vers. MMDBMS (1) • Both „approaches“ process data in mainmemory, and both keep a (backup) copy on disk • The key difference is thatin an MMDBMS the primary copy of the database livespermanently in main memory and this has important implications for the algorithmsand data structures used.
DRDBMS vers. MMDBMS (2) • Even if all DB of DRDBMS is cached in main memory, it will notprovide best performance since a DRDBMS is not tunedfor this case (buffer manager). • Index structures in DRDBMSs aredesigned for disk access and may trade computing powerand storage efficiency for a lower number of disk accessessince disk access is the processing-time dominating factorin DRDBMSs. • This overhead incurs even if all data iscached in main memory.
DRDBMS vers. MMDBMS (3) • In MMDBMSs data is guaranteed to stay present in mainmemory • Index structures and all the other parts of thedatabase do not need to consider disk access • Algorithms can betuned for low computational cost.
DRDBMS • The most important in optimizations: • to reduce the number of disk accesses, • to prefer sequential access, • to keep the processor busy while waiting for I/O. • To reduce the number of disk accesses: • caching is used, • special index structures (B+ tree) were developed. • data is grouped on disk to be able to access it using one single sequential read. • A high degree of concurrency is employed to keep the processor busy while other transactions are waiting for I/O, and therefore a small locking granularity down to record level locking is used. • Each access has to go to the buffer manager to bring data into main memory or to make sure that data is already in main memory
MMDBMS • Pointers can be used to address data (using pointers reduce the size of the index and complexity of dealing with long or variable length fields). • Since a new access time gap between on-processor cache and main memory data layout becomes more critical again (column data store) • Index structures can be tuned for a low consumption of main memory and processing power (no disk accesses; T-trees) • Performance is only determined by the efficiency of the algorithms used (SSEx, GPGPU).
Commit Processing – recovery • A commit must be able to guarantee persistence • Since data in main memory is volatile, some kind of mechanism must exist that protects data in case of a failure: • logging to disk (bottleneck) • Solid-state discs • Backed-up DRAMS • Logging over a network
Commit Processing – backup • Main memory is volatile and is lost in case of a power failure - backup copy of the database must exist. • Obvious solution is to backup data to disk(s). • Backup must be kept up-to-date - commonly used mechanism is checkpointing • Some problems not found in DRDBMSs: • How and where should data be stored? • How should pointers be treated?
ORACLE - TimesTen • http://www.oracle.com/database/timesten.html
DB2 – SolidDB • http://www.informationweek.com/news/software/database/showArticle.jhtml?articleID=208402402 • SolidDB is an in-memory relational database that can serve as a high-performance front end to either DB2 or Informix Dynamic Server