160 likes | 264 Vues
In this update on Damasc, we discuss the evolution from the previous DICE framework to a more focused data management system. The goal of Damasc is to enable applications to express internal data structures to optimize storage systems for enhanced functionality. Key topics include the creation of access patterns for scientific data, implementing MapReduce programs for NetCDF files, and integration of these formats into key-value stores. We'll also explore tracing scientific application data access and outline our future work in declarative queries and XML parsing.
E N D
Update on Damasc • Joe Buck October 19th, 2010
A year later • Last year: we outlined our vision • Next year: Carlos and Alkis covered that • Today: Where we’re at
What’s in a name? • Last year I presented on Dice (Data Intensive Computation Environment) • We’ve change the name to Damasc, which incorporates parts of DICE but is more focused on data management
Goal of Damasc • To allow applications to express their internal data structure to the storage system • Enable more intelligent storage layout which leads to increased functionality in the storage system
Application data-element alignment in parallel FS • Created traces for common access patterns over scientific data • Mapped those traces onto a theoretical parallel file system configuration • Analyzed traces to quantify IO savings from aligning data to application data element boundaries
Application data-element alignment in parallel FS - cont. We want to go from this We want to go from this
MapReduce over scientific data • Goal was to implement NetCDF Operators (NCO) as MapReduce programs • Base NetCDF file decomposed via C++ application. Constituent parts stored in HDFS • Currently being worked on
MapReduce over scientific data - continued We want to go from this To this
MapReduce over scientific data - continued We want to go from this Or better yet
Tracing of scientific application data access • Created a tracing layer for ParaView that logged data access from the application’s perspective • Noah will talk more about tracing
Scientific data in a key-value store • Project to enable NetCDF ingestion into HBase
Declarative queries over NetCDF • Integration of NetCDF format into Zorba query engine • Enabling XML queries over NetCDF • Incremental parsing to avoid loading entire file • Future work: NetCDF methods in XML Query
Conclusion • Last year was about exploring the problem space • Applying lessons learned, moving forward
Questions • Thank you for your time • buck@soe.ucsc.edu • srl.ucsc.edu/projects/damasc