110 likes | 225 Vues
The paper discusses the challenges of managing growing petabyte-scale multidimensional data, emphasizing the use of hierarchical storage systems for optimal data retrieval. It introduces the RasDaMan multidimensional array DBMS which effectively utilizes multi-tiered storage strategies, allowing for efficient data access and minimizing expensive tape operations. The implementation of the R+ tree indexing enables quick subsets retrieval while leveraging metadata storage in traditional DBMS, ultimately facilitating the management of vast quantities of data through innovative techniques such as tiling, clustering, and parallelization.
E N D
HEAVENA Hierarchical Storage and Archive Environment for Multidimensional Array-DBMS Bernd Reiner reiner@forwiss.tu-muenchen.de
set of multidimensional tiles • tile = subarray • tiles stored in relational DBMS BLOBS • multidimensional index (R+ tree) Index Access to subsets of MDDs Multidimensional query language RasQL Array DBMS • Multidimensional object (MDD)
Motivation • Increasing amount of data (up to Petabyte) • Hard disks too small/expensive to hold hundreds of Terabytes • Typically data stored as files on Hierarchical Storage Management Systems (HSM-System, e.g. Tapes) • DBMS only used for Metadata • With the multidimensional array DBMS RasDaMan only subsets must be transferred instead of whole MDDs Include archived data in DBMS data access
Client RasQL Hierarchical StorageManagement System RasDaManServer Tertiary Storage Manager File Storage Manager SQL DBMS Oracle / DB2 Offline Storage HSM HSM migrate migrate export export import import DBMS (on HDD) Cache Cache stage stage Nearline Online Offline System Architecture MultidimensionalArray DBMS
Optimization • Minimization of tape access operations • Tiling, Object-Framing, Caching • Minimization of media exchange operations • Clustering, ordered Query-Queue, “lazy eject” • Minimization of positioning time • Clustering, ordered Query-Queue • Parallelization • Inter, intra object parallelization Publications: VLDB 2002, DEXA 2002, DEXA 2003
One Tile Super-Tile algorithm export Tile 1 Tile 2 Tile 3 Tile 4 ST-4 ST-1 ST-2 ST-3 Magnetic Tape Export to Tertiary Media Preserves multidim. clusteringon Tape
compute Super-Tiles RasDaMan viewer ImportSuper-Tiles Import from Tertiary Media
Object-Framing • Reducing tape access
Data Retrieval from HSM Partitioning of data random Object: mpim4d (1,35 GByte) Positioning Time (sec.) Read data DLT4000average accesstime 68s Super-Tile-No. (48 MByte)
Data Retrieval from HSM Partitioning of data Super-Tile clustering Object: mpim4d (1,35 GByte) Positioning Time (sec.) Read data DLT4000average accesstime 68s Super-Tile-No. (48 MByte)
Clustering vs. Random order Time (sec.)