1 / 23

Processing 2.5 Terapixels of the Sky in 2 Days

Processing 2.5 Terapixels of the Sky in 2 Days. George Fekete, JHU. DR7 Visual Images. DR7 Visual Images. 1,393 rows 1,984 columns 427,853 fields 1,184,462,470,336 pixels. DR7 Visual Images.

fawzi
Télécharger la présentation

Processing 2.5 Terapixels of the Sky in 2 Days

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Processing 2.5 Terapixelsof the Sky in 2 Days George Fekete, JHU

  2. DR7 Visual Images

  3. DR7 Visual Images 1,393 rows 1,984 columns 427,853 fields 1,184,462,470,336 pixels

  4. DR7 Visual Images 1,393 rows 1,984 columns 427,853 fields 1,184,462,470,336 pixels 3,553,387,411,008 3 bands of FITS pixels

  5. Goals • Pretty images • in the eyes of the beholder • Ease of manipulation • store pixels in DB • on demand cutout and mosaic initiated inside the DB • store one entire colour image in < ¾ MB • uncompressed TGA is 8M, good jpeg is 2 ¼ MB • Important preconditions to good compressibility • background should have little or no salt and pepper noise • choose a good despeckler

  6. Without And With Despeckling

  7. Two Distinct Despecklers Which seems better?

  8. Same Two ― Laplacian What about now?

  9. Same Two ― Laplacian WINNER! Magick Photoshop

  10. Process Raw Color Images • Despeckle • better visual experience • better compressibility • Photoshop (!) • has best despeckling filter we found • can do jpeg 2000 codec • can do all other necessary tasks • jpeg2000? • compresses better than jpeg • produces fewer undesirable visual artifacts • j2k is 28% of jpeg or 8% of TGA

  11. What's The Big Deal? • 500,000 images in 24 hours? doesn't seem like a lot especially if you can use a thousand processor cluster. • 2 Step process • FITS to TGA (formerly fits2jpeg) • been there, done that • about 2s per field (without optimzation) • Use Photoshop • (cont...)

  12. What's The Big Deal? • Tasks for Photoshop • open a TGA • add a little noise cleaning • apply despeckle filter • save as jpeg2000 • reduce size by ½ to make ½ size image • save as jpeg2000 • reduce again to make ¼ size image • save as jpeg2000 • reduce again to make 1/8 size image • adjust contrast and brightness for small thumbnail • save as jpeg2000 • delete TGA • relese all resources • Do this about 500,000 times robustly

  13. Unsupervised Photoshoping • NECESSARY • Photoshop runs under Windows XP • Windows XP runs under qemu (virtual PC thing) • qemu runs the Linux cluster (HHPC) • Photoshop can be controlled by a custom .net application • Therefore ... photoshop runs on the linux cluster • SUFFICIENT • qemu /WinXP can see the file system • qemu/WinXP/Photoshop can run without a phyisical display • Therefore it is doable

  14. Flow FITS FITS to TGA TGA TGA to j2k j2k

  15. Two Steps Decoupled FITS FITS to TGA TGA Runs asynchronously. Available resources can be added or removed any time TGA to j2k j2k

  16. FITS to TGA jobtable skydev/skyfits WS TGA TGA FITS to TGA FITS

  17. FITS to TGA jobtable skydev/skyfits WS TGA TGA FITS to TGA FITS

  18. TGA to jpeg2000 jobtable skydev/skyfits WS TGA TGA to j2k j2k

  19. Image generation workflow jobtable skydev/skyfits WS TGA TGAPoller .netapp controlsPhotoshopthrough exposedmethods Photoshop j2k

  20. Image generation workflow skydev/skyfits WS jobtable edges node(s) TGAServer TGA TGAPoller Photoshop work nodes j2k shared file system

  21. "Scheduler" is a DB jobtable jobid, run, rerun, camcol, field, status (ready, working, done) TGA path, output directory, nodeid, grabbed(timestamp), finished(timestamp)

  22. Framework • HHPC Cluster • 154 nodes, 1232 processors • PBS job submission • Linux • Windows + Photoshop is run as a qemu job • One time: make a C: disk image, install qemu • All processors use same C: disk image • Each instance of qemu runs in snapshot mode • C: read-only • incremental change to disk image cached locally • can kill qemu instead of gracefull shutdown (PBS proof) • qemu runs without a display window pixels are in /dev/null

  23. Performance for DR7 images • 427,853 fields/job • 140 seconds total per job (measured) • fits to TGA 2s • TGA to j2k 136s • 5,989,940s = 693 days (one processor) • 0.56 day (1232 processors + leap of faith) • add 60% fudge factor penalty • Still does it in a day, with two hours to spare

More Related