610 likes | 890 Vues
HDF. HDF Update. Mike Folk National Center for Supercomputing Applications HDF and HDF-EOS Workshop IX December 1, 2005. Outline. Organizational info HDF Software Update Other Activities of Interest. Organizational info. The HDF Team. Frank Baker Christian Chilan Peter Cao
E N D
HDF HDF Update Mike Folk National Center for Supercomputing Applications HDF and HDF-EOS Workshop IX December 1, 2005
Outline • Organizational info • HDF Software Update • Other Activities of Interest
The HDF Team Frank Baker Christian Chilan Peter Cao Vailin Choi Mike Folk Fang Guo Anne Jennings Barbara Jones Quincey Koziol James Laird Raymond Lu John Mainzer Pedro Nunes Elena Pourmal Binh-minh Ribler Eric Shapiro Rishi Sinha Arash Termehchy Kent Yang And all those wonderful folks out there who contribute ideas, requests, bug reports, code, and support.
HDF The HDF Group is Moving
THG • Why spin off from U of Illinois? • Creating a sustainable organization • We do more than R&D • THG already exists
How will THG be different from the NCSA HDF Group? • Business model • Location • Staff • THG – NCSA – UIUC relations • Affect on NASA and other affiliation • Intellectual property
Major software milestones since Oct. 2004 HDF Java 2.1 HDF Web browser plug-in HDF 4.2r1 HDF5 1.6.4 HDF4-to-HDF5 conversion tools 1.2 HDF Java 2.2 HDF5 1.6.5 Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov 2004 2005
HDF 4.2r1 – February 2005 • Szip compression fixed • Windows • hdiff and hrepack added • Config, build, testing procedures improved • h4fc utility fixed
Mac OS X Fortran IBM xlf v. 8.1 Absoft f95 v. 8.2 AMD Opteron Cray TS IEEE Linux 2.4 Absoft Fortran f95 v. 9.0 PGI C and Fortran Intel C and Fortran HDF 4.2r1 – new compilers and platforms
HDF5-1.6.4 – March 2005 • High-Level (HL) library • Some new C APIs added • Fortran APIs added • HL library now built and installed by default • Library built and tested with SZIP 2.0. • Many changes to improve library performance • Especially for variable length types and metadata cache • H5jam – a new utility • Allows a text file to be added to the "user block" at the beginning of an HDF5 file
Operating systems Solaris 2.8 HPUX B.11.00 Crays T3E and T90 Linux RH 7.* and 8.* Windows 2000 Compilers We use the latest versions of vendors compilers as they become available and drop the previous ones Platforms to be dropped in future releases
Systems Solaris 2.10 Cray X1 Cray XT3 NEC SX6 HP 64-bit (HPUX 11.23) Mac OS 10.4 Compilers gcc 4.* HDF5 Fortran: Leahy, NAG, G95 MPI-2 Platforms to be added
Coming next: Major release HDF5 1.8 • Windows MPICH support: prototype • Integer to float conversions • Will support integer to float conversions during I/O • http://hdf.ncsa.uiuc.edu/RFC/dtype_conv_overflow/Overflow.html • New error-handling API • Dimension scales • Similar to dimension scales in HDF4 • http://hdf.ncsa.uiuc.edu/RFC/H5DimScales/H5dimscale_Specification_1_0-5.pdf
N-bit compression filter • Compact storage for user-defined datatypes. • http://hdf.ncsa.uiuc.edu/RFC/NBitPacking/NBitPacking.html • Offset+size storage filter • Performs a scale and/or offset operation on each data value, truncating the resulting value to a lesser number of bits before storing it. • http://hdf.ncsa.uiuc.edu/RFC/ScaleOffsetCompress/ScaleOffsetCompress.html
Group revisions • Option to access objects according to creation order • Improved performance for groups containing a large number of objects. • http://hdf.ncsa.uiuc.edu/RFC/ReviseGroups/ • Improved metadata cache • New metadata cache improves performance and memory usage in the library. • Apps that access files with a large number of objects should see significant performance improvement and should use less memory.
Data transformation filter • Performs data transformation during I/O operations. • Transform expressed by algebraic formula (e.g. a*x + b) • http://hdf.ncsa.uiuc.edu/HDF5/doc_dev_snapshot/H5_dev/html/RM_H5P.html#Property-SetDataTransform • Ph5diff – parallel h5diff • Compares two files in an MPI parallel environment. • Compares multiple datasets simultaneously. • http://hdf.ncsa.uiuc.edu/RFC/PH5DIFF/
HDFpacket API • Data collected in “packets” • “Horizontal” view, per time step • Efficient access to fixed- and variable-length records • http://hdf.ncsa.uiuc.edu/RFC/HDF5Packet/Tech_reprt_HDF5Packet.pdf • Possible: HDFtime_history API • Archival, viewing, analysis • “Vertical” view, per parameter
SZIP integration with HDF4 and HDF5 • Development and integration completed • Includes libraries and tools • SZIP documentation web page • http://hdf.ncsa.uiuc.edu/doc_resource/SZIP/ • Examples and performance studies for HDF5
Parallel I/O and chunking • Collective I/O – key to improving performance for parallel HDF5 • Current versions only allow collective I/O for regular selection in contiguous storage • Expanding use of collective IO in HDF5 • For regular selection in chunked storage • For irregular selection for both chunked and contiguous storage
Tools development • HDF4 • hrepack and hdiff performance improved • H4 to H5 Conversion Tools • Updated to HDF4.2r1, HDF5-1.6.4 • H5jam • New tools to add/remove user block in front of file • H5dump • Faster for files with large numbers of objects • Can dump contents of the boot block • Can dump dataset filters, storage layout, fill value • Parallel h5diff • Enables h5diff to run in parallel
HDFView changes • Support for Storage Resource Broker (SRB) • HDF5 object level access to remote files • Display HDF5 compound datatypes with arrays • Create/display HDF5 named datatypes • Create links in HDF5 • Improve ability to manipulate palette • Select row/column for xy plot in the table view
New Functions in Java API • Request an individual object without loading entire structure of file • Send client request to SRB server and receive result from server • Create HDF5 indexing table • Query for HDF5 datasets
HDF Web-browser Plug-in • Extends browser to display HDF4/5 files • A ‘lite” version of HDFView • Analogous to PDF reader • Fewer browsing features • No editing features • Windows Only
HDF Web-browser Plug-in • Not an applet • It is downloaded and installed once • An applet is downloaded with each invocation • http://hdf.ncsa.uiuc.edu/plugins/
HDF-EOS module for HDFView • Developed by HDF-EOS team • Optional module for HDF-EOS files • Reads, displays HDF-EOS grid, swath, etc. • (Generic modules show native HDF5 objects) • Tested with HDFView 2.3 • To do -- get permission to release with HDFView
Future work for Java • Add OPeNDAP client support to HDFview • Seamlessly retrieve data from any OPeNDAP server • Support HDF5 Dimension Scales • Recognize geospatial coordinates • Support for HDF5 Indexing • Create indexing table and query HDF5 datasets • H5Gen • Generate HDF5 file from XML file
DOE/ASC* “ASC provides the integrating simulation and modeling capabilities and technologies needed …for future design assessment and certification of nuclear weapons and their components” • Massively parallel computing and I/O • Complex data models and big data • HDF5 a standard format for ASC apps * “Advanced Simulation and Computing Program”
BoeingHDF5 for flight test data • Commercial (Boeing 787) and military planes • 787 active archive • HDFtime_history • 10 TB per flight-test day • Also post-testing data • Must handle raw, real-time data • Variable-length datatypes/records • High speed ingest • HDFpacket API
Boeing High Level API’s • HDFpacket (see above) • HDFtime_history • Structured records for archive, analysis, viewing • “Vertical” view, per parameter
Object encryption to support access control • For Boeing • Investigated the role of encryption in developing access control • Developed a prototype, now being tested
Projection Indexes in HDF5 • Standardize indexing in HDF5 • Make indexes portable • Just a prototype • See Rishi Sinha’s talk
Product data exchange – STEP • STEP is an ISO data transfer standard. • Defines characteristics of product throughout its life cycle. • Widely used in design and manufacturing. • Uses EXPRESS data modeling language to describe data. STEP
STEP Limitations • Currently text-based format • Requires all the objects to be in memory • Apps starting to produce very large data volumes • EU looking for a binary equivalent for STEP
HDF5 as binary format for STEP • EU identified HDF5 as best candidate • Prototype in the works • EXPRESS HDF5 mappings • Convert sample data collections • Workshop at U of Illinois next week. • National Archives also funding HDF study.
DNA sequencing workflows • Diverse formats, some proprietary • Highly redundant data • Repeated file processing • Disconnected programs • In-core processing models • Lack of persistence
Multiple Levels of Information SNP Score Contig Summaries Discrepancies Contig Qualities Coverage Depth Trace Reads Aligned bases Read quality Contig Percent match