120 likes | 253 Vues
This paper discusses various approaches to compressing astronomical data, addressing the challenges posed by increasing data volumes generated by new instruments and survey programs. It presents the limitations of traditional compression methods and explores the advantages of using FITS tile compression conventions and tools like fpack. By ensuring fast, lossless compression, the paper highlights how scientific integrity is preserved while reducing storage, bandwidth, and latency costs. The discussion emphasizes the adaptability of compression algorithms for efficient data transport and storage without sacrificing accessibility.
E N D
Astronomical Tiled Image Compression How & Why
Authors: • Rob Seaman, NOAO • Bill Pence, NASA/GSFC • Rick White, STScI • Mark Dickinson, NOAO • Frank Valdes, NOAO • Nelson Zárate, NOAO
Statement of problem • No one compression is always best • New instruments and survey programs will dwarf data sets that have come before • Observatories' data storage costs • Transport latency & bandwidth challenge not just budgets, but technology and human patience • The bottom line is data handling throughput, not static storage
Host level compression • Per-file gzip compression • Contents of file are opaque • Speed of compression • Speed of decompression • Size of output • Limited support for on-the-fly decompression
How • FITS tile compression convention • Provides a general framework • Supports any compression algorithm that can operate on multidimensional image sections • FITS headers remain readable • Access to individual FITS HDUs • Files are still FITS
Limitations • Only partially supported by IRAF • Supported by CFITSIO, but caveats: • Not idempotent, even a losslessly compressed file would suffer keyword changes • Original convention covered only per-HDU issues, e.g., compressing a SIF produced same binary table as MEF original • Only application was the limited imcopy example program • Unsupported algorithms
Improvements • fpack compression tool • Compress images in-place • Multi-image archives for efficiency • Idempotent • Supports FITS Checksum • Applications layered on CFITSIO access compressed files and file archives transparently • Support for Hcompress • General purpose option for adaptively scaling input data.
fpack / funpack fpack, a FITS tile-compression engine. Version 0.8.2 (25 September 2006) usage: fpack [-r|-p|-g|-h] [-w|-t <axes>] [-n <bits>] [-v] [-Etc] <FITS> Flags must appear (separately) before filenames: -r Rice compression [default], or -p PLIO compression, or -g GZIP (per-tile) compression -h Hcompress compression -w override tile size to be whole image, or -t <axes> comma separated list of tile sizes [default=row] -n <bits> noise bits to preserve for real pixels [default=4] -v verbose -F clobber output [default overwrites input in-place] -K keep (don't delete, overwrite or change) input files -A <file> write (append or clobber) output to single file, or -P <pre> prepend <pre> to create separate output filenames -L list and validate contents, files unchanged -H print this message -V print version number <FITS> FITS files or extensions to pack
… & Why • Preserve the scientific integrity of processed astronomical data sets • Native integer data products permit lossless compression techniques for neutral effect, or • May benefit from lossy compression for high compression factors • Processing, pipeline or hands-on, often creates floating point • Choose lossy compression, or • Scale data into integers
Compression statistics Additional cost for gzip’ed floating point output from pipeline is $2.86 per image versus Rice compressed integers.
Benefits • Reduced: • Diskspace • Bandwidth • Latency • Remove need to decompress • Pack multiple files for efficient transport • Headers remain readable • Individual HDUs are accessible • Choice of algorithm isn’t fixed
DMS architecture • Benefits NSA, NHPP, NVO portal • No need for ASCII header files • Smaller footprint • Faster replication • Files remain FITS throughout • Extends upstream into domes • Extends downstream to users • Compression can be free or better than free