1 / 42

Introduction to HDF5 Tools

Introduction to HDF5 Tools. Tutorial Part II. Outline. Overview of HDF5 tools Using tools for problems troubleshooting. HDF5 command-line tools. Readers h5dump, h5diff, h5ls 1.8 tools: h5check, h5stat Writers h5repack, h5repart, h5import, h5jam/h5unjam

shima
Télécharger la présentation

Introduction to HDF5 Tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to HDF5 Tools Tutorial Part II LCI Tutorial

  2. Outline • Overview of HDF5 tools • Using tools for problems troubleshooting LCI Tutorial

  3. HDF5 command-line tools • Readers • h5dump, h5diff, h5ls • 1.8 tools: h5check, h5stat • Writers • h5repack, h5repart, h5import, h5jam/h5unjam • 1.8 tools: h5copy, h5mkgrp • Converters • h4toh5, h5toh4, gif2h5, h52gif LCI Tutorial

  4. h5dump • Dumps the content of an HDF5 file to standard output and optionally to the following types of files • ASCII text file • XML file • Binary file • Flags to remember • -H to print header information • -p to print objects’ properties • -b to export data in a binary form • -o to export data to a file (text by default) • -y to skip printing indices • -w to specify line width LCI Tutorial

  5. h5dump -H SDS.h5 HDF5 "SDS.h5" { GROUP "/" { GROUP "Floats" { DATASET "FloatArray" { DATATYPE H5T_IEEE_F32LE DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) } } } DATASET "IntArray" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 5, 6 ) / ( 5, 6 ) } } } } LCI Tutorial

  6. h5dump -d /Floats/FloatArray SDS.h5 HDF5 "SDS.h5" { DATASET "/Floats/FloatArray" { DATATYPE H5T_IEEE_F32LE DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) } DATA { (0,0): 0.01, 0.02, 0.03, (1,0): 0.1, 0.2, 0.3, (2,0): 1, 2, 3, (3,0): 10, 20, 30 } } } LCI Tutorial

  7. h5dump -x SDS.h5 LCI Tutorial

  8. h5dump binary output -b F, --binary=F The form of the binary output (F): • MEMORY -- for memory type • Data in a file will have the same data type as in memory • FILE -- for the disk file type • Data in a file will have the same data type as corresponding dataset in an HDF5 file • LE -- for pre-defined little endian type • H5T_IEEE_F64LE • BE -- for pre-defined big endian type • H5T_STD_I32BE LCI Tutorial

  9. h5dump -d /IntArray -o out_le.bin -b LE SDS.h5 od --width=24 -t x4 out_le.bin 0000000 00000000 00000001 00000002 00000003 00000004 00000005 0000030 0000000a 0000000b 0000000c 0000000d 0000000e 0000000f 0000060 00000014 00000015 00000016 00000017 00000018 00000019 0000110 0000001e 0000001f 00000020 00000021 00000022 00000023 0000140 00000028 00000029 0000002a 0000002b 0000002c 0000002d Dumps a 32-bit integer dataset, IntArray, from SDS.h5 to a little endian binary file out_le.bin LCI Tutorial

  10. h5diff Using h5diff, you can • compare two objects in the same file • compare two objects between two files • compare all objects between two files LCI Tutorial

  11. h5diff SDS.h5 SDS2.h5 • Dataset: </IntArray> and </IntArray> • 5 differences found LCI Tutorial

  12. h5diff SDS.h5 SDS2.h5 -r /IntArray Dataset: </IntArray> and </IntArray> position IntArray IntArray difference ------------------------------------------------------------ [ 0 0 ] 0 10 10 [ 1 0 ] 10 100 90 [ 2 0 ] 20 200 180 [ 3 0 ] 30 300 270 [ 4 0 ] 40 400 360 5 differences found LCI Tutorial

  13. h5repack • Copies an HDF5 file to a new file with/without compression/chunking • Remove un-used space • Apply compression filter • Apply layout LCI Tutorial

  14. h5repack: Applying filters -f FILTER • GZIP, to apply GZIP compression • SZIP, to apply SZIP compression • SHUF, to apply the HDF5 shuffle filter • FLET, to apply the HDF5 checksum filter • NBIT, to apply NBIT compression • SOFF, to apply the HDF5 Scale/Offset filter • NONE, to remove all filters For example h5repack -i SDS2.h5 -o SDS2_compressed.h5 -f /IntArray:GZIP=9 Remember that if your data is smaller than 1K, compression will not be applied, see -m flag LCI Tutorial

  15. h5repack: Data layout -l LAYOUT • CHUNK, to apply chunking layout • COMPA, to apply compact layout • CONTI, to apply continuous layout For example h5repack -i SDS.h5 -o SDS_chunk.h5 -l /Floats/FloatArray,/IntArray:CHUNK=2x3 LCI Tutorial

  16. h5repart Repartitions a file or family of files For example h5repart -m 200m int16kx16k.h5 part200m%d.h5 200 MB part200m0.h5 977 MB 200 MB part200m1.h5 200 MB part200m2.h5 200 MB part200m3.h5 177 MB part200m1.h5 LCI Tutorial

  17. h5import Imports binary/ASCII data into an HDF5 file h5import infile -c config_file [infile -c config_file2 ...] -outfile outfile Example: h5import float5x4x2.txt -c First_set.conf -o First_set.h5 GROUP "/" { GROUP "work" { DATASET "First-set" { DATATYPE H5T_IEEE_F64LE DATASPACE SIMPLE { ( 5, 2, 4 ) / ( 8, 8, H5S_UNLIMITED ) } DATA { (0,0,0): 1.01, 1.02, 1.03, 1.04, (0,1,0): 1.11, 1.12, 1.13, 1.14, (1,0,0): 1.21, 1.22, 1.23, 1.24, (1,1,0): 1.31, 1.32, 1.33, 1.34, (2,0,0): 1.41, 1.42, 1.43, 1.44, (2,1,0): 1.51, 1.52, 1.53, 1.54, (3,0,0): 2.01, 2.02, 2.03, 2.04, (3,1,0): 2.11, 2.12, 2.13, 2.14, (4,0,0): 2.21, 2.22, 2.23, 2.24, (4,1,0): 2.31, 2.32, 2.33, 2.34 } } } }} PATH work/First-set INPUT-CLASS TEXTFP RANK 3 DIMENSION-SIZES 5 2 4 OUTPUT-CLASS FP OUTPUT-SIZE 64 OUTPUT-ARCHITECTURE IEEE OUTPUT-BYTE-ORDER LE CHUNKED-DIMENSION-SIZES 2 2 2 MAXIMUM-DIMENSIONS 8 8 -1 LCI Tutorial

  18. h5jam/h5unjam • Adds/removes a file at the beginning of an HDF5 file • Example: • h5jam -- adds text to User Block h5jam -u test_ub.txt -i test_ub.h5 • h5unjam -- removes text from User Block h5unjam -i test_ub.h5 -o out_ub.txt -o out_ub.h5 LCI Tutorial

  19. h5ls • Lists selected information about file objects in the specified format Example: h5ls -r SDS2.h5 • /Floats Group • /Floats/DoubleArray Dataset {10, 5} • /Floats/FloatArray Dataset {4, 3} • /Floats/subs Group • /IntArray Dataset {5, 6} LCI Tutorial

  20. gif2h5 / h52gif • gif2h5 – Converts a GIF file into HDF5 gif2h5 apollo17_earth.gif apollo17_earth.h5 • h52gif – Converts an HDF5 file into GIF h52gif apollo17_earth.h5 apollo17_earth2.gif -i /apollo17_earth.gif/Image0 -p "/apollo17_earth.gif/Global Palette" LCI Tutorial

  21. h5copy • Copies an object from one location to another location within a file or across files • Available in 1.8.0 and later / / Floats IntArray FloatArray FloatArray LCI Tutorial

  22. h5copy usage: h5copy [OPTIONS] [OBJECTS...] • -i, --input input file name • -o, --output output file name • -s, --source source object name • -d, --destination destination object name • -f, --flag <value> shallow Copy only immediate members for groups soft Expand soft links into new objects ext Expand external links into new objects ref Copy objects that are pointed by references noattr Copy object without copying attributes LCI Tutorial

  23. h5copy Example h5copy -i SDS.h5 -o SDS_cp.h5 -s /Floats/FloatArray -d /FloatArray / / Floats IntArray FloatArray FloatArray SDS_cp.h5 SDS.h5 LCI Tutorial

  24. h5copy -f shallow / floats -f shallow 64-bit / f32 floats integers / 64-bit i1 i2 floats f32 f1 f2 64-bit f32 f1 f2 LCI Tutorial

  25. h5copy -f soft / / f1 dset_SL /f1 -f soft f1 dset_SL /f1 / dset_SL /f1 LCI Tutorial

  26. h5copy -f ref / d1 d2 / dset_ref -f ref d2 d1 / dset_ref dset_ref LCI Tutorial

  27. h5stat • Prints different statistics about HDF5 file • Helps • To troubleshoot size overhead in HDF5 files • To choose specific object’s properties and storage strategies • Available in 1.8.0 and later LCI Tutorial

  28. h5check • Verifies if an HDF5 file is encoded according to the HDF5 File Format Specification • Does not use HDF5 library • Serves as a watch dog that the HDF5 library implementation is compliant with the HDF5 File Format Specification • Tool is NOT a part of the HDF5 source code distribution LCI Tutorial

  29. How to use it? h5check [-vn] <filename> -vn verboseness mode n=0 Terse—only prints if the file is compliant or not n=1 Default—prints its progress and all errors found n=2 Verbose—prints everything it knows, usually for debugging LCI Tutorial

  30. Example: a compliant file % h5check example1.h5 VALIDATING example1.h5 FOUND super block signature VALIDATING the super block at 0... VALIDATING the object header at 928... VALIDATING the btree at 384... FOUND btree signature. VALIDATING the local heap at 96... FOUND local heap signature. … Result: File is in compliance. LCI Tutorial

  31. Example: a non-compliant file h5check invalid2.h5 FOUND super block signature VALIDATING the super block at 0... VALIDATING the object header at 928... VALIDATING the btree at 384... FOUND btree signature. VALIDATING the SNOD at 1248... FOUND SNOD signature. VALIDATING the object header at 976... check_sym(at 1248): Errors from check_obj_header() decode_validate_messages(): Failure in type->decode(). H5O_sdspace_decode(): Bad version number in simple dataspace message. VALIDATING the local heap at 96... FOUND local heap signature. Main(): Errors from check_obj_header(). decode_validate_messages(): Failure in type->decode(). H5O_attr_decode(): Can't decode attribute dataspace. H5O_sdspace_decode(): Bad version number in simple dataspace message. … Result: File is not in compliance. LCI Tutorial

  32. Using HDF5 Tools for Performance Tuning and Troubleshooting LCI Tutorial

  33. Introduction • HDF5 tools may be very useful for performance tuning and troubleshooting • Discover objects and their properties in HDF5 files h5dump -p • Get file size overhead information h5stat • Get locations of the objects in a file h5ls • Discover differences h5diff, h5ls • Location of raw data h5ls –var LCI Tutorial

  34. h5stat • Prints different statistics about HDF5 file • Helps • To troubleshoot size overhead in HDF5 files • To choose specific object’s properties and storage strategies • To use • h5stat --help • h5stat file.h5 • Full spec can be found http://www.hdfgroup.uiuc.edu/RFC/HDF5/h5stat/ • Let us know if you need some “special” type of statistics LCI Tutorial

  35. h5stat • Reports two types of statistics: • High-level information about objects (examples): • Number of different objects (groups, datasets, datatypes) in a file • Number of unique datatypes • Size of raw data in a file • Information about object’s structural metadata • Sizes of structural metadata (total/free) • Object headers, local and global heaps • Sizes of B-trees • Object headers fragmentation LCI Tutorial

  36. h5stat • Examples of high-level information: File information # of unique groups: 10008 # of unique datasets: 30 # of unique named datatypes: 0 …………………… Max. # of links to object: 1 Max. depth of hierarchy: 4 Max. # of objects in group: 19 …………………… Group bins: # of groups of size 0: 10000 # of groups of size 1 - 9: 7 # of groups of size 10 - 99: 1 …………………… Max. dimension size of 1-D datasets: 1643 …………………… Dataset filters information: Number of datasets with ……………… SZIP filter: 2 ……………… NBIT filter: 10 USER-DEFINED filter: 1 LCI Tutorial

  37. h5stat • Conclusion: • There are a lot of empty groups in the file; good candidate for compact group feature (h5repack -l ….) • Some datasets use “user-defined” filters and may not be readable by HDF5 library • SZIP compression is needed to read some datasets • Oh… my application uses buffers of size 1024 to read data… • No wonder it crashes on reading… • Do I have all filters needed to read the data? LCI Tutorial

  38. h5stat • Examples of structural metadata information: Object header size: (total/unused) Groups: 1808/72 Datasets: 15792/832 ……… Dataset storage information: Total raw data size: 6140688 ……… Dataset datatype #3: Count (total/named) = (2/0) Size (desc./elmt) = (10/65535) Dataset datatype #4: Count (total/named) = (1/0) Size (desc./elmt) = (10/32000) LCI Tutorial

  39. h5stat • Conclusions • File size: 6228197 • 1.5% overhead (not bad at all!) • There some elements of size 65535 and 32000 • Oh… Is it really what I want? • Should I use other datatype and get advantage of compression? LCI Tutorial

  40. Case study: Using HDF5tools to debug a problem • My application creates files on Windows with VS2005 and VS2003. I can read the VS2003 file but not the VS2005 one. H5dump reads both files OK and there are no differences. What am I doing wrong? • h5diff good.h5 bad.h5 Datatype: </Definitions/timespec> and </Definitions/timespec> 1 differences found • h5ls –var good.h5 /Definitions/timespec Type Location: 0:1:0:900 • h5debug good.h5 900 Message Information: Type class: compound Size: 8 bytes • h5debug bad.h5 900 Message Information: Type class: compound Size: 16 bytes LCI Tutorial

  41. Case study: Using HDF5tools to debug a problem • Conclusions • Compound datatype “timespec” requires different number of bytes on VS2005 (16 bytes; 2x8bytes) and on VS2003 (8bytes; 2x4bytes) • Oh… How do I read my data back? • I assumed that my struct would need only 8 bytes for each element but • it needs 16 bytes on VS2005. I need H5Tget_native_type function • to find the type of my data in memory LCI Tutorial

  42. Questions? End of Part II LCI Tutorial

More Related