1 / 23

NPP Atmosphere PEATE

NPP Atmosphere PEATE. Climate Data Processing Made Easy Scott Mindock. Atmosphere PEATE Team Space Science and Engineering Center University of Wisconsin-Madison 10 July 2008.

tadita
Télécharger la présentation

NPP Atmosphere PEATE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NPP Atmosphere PEATE Climate Data Processing Made EasyScott Mindock Atmosphere PEATE TeamSpace Science and Engineering CenterUniversity of Wisconsin-Madison10 July 2008

  2. The NPP Atmosphere PEATE is implemented within the framework and facilities of the Space Science and Engineering Center (SSEC) at the University of Wisconsin-Madison. SSEC has been successfully supporting operational, satellite-based remote-sensing missions since 1967, and its capabilities continue to evolve and expand to meet the demands and challenges of future missions. Space Science Engineering Center (SSEC) 1. Employs ~ 250 scientists, engineers, programmers, administrators and IT support staff. 2. Satellite missions currently supported: GEO: GOES 10/11/12/R; Meteosat 7/9; MTAT-1R; FY 2C/2D; Kalpana LEO: NOAA 15/16/17/18, Terra, Aqua, NPP, NPOESS, FY 3, MetOp

  3. Funding and Related Work • Atmosphere PEATE is funded under NASA Grant NNG05GN47A • Award Date: 10/07/2005 • Grant Period: 08/15/2005 to 8/14/2008 (renewal in progress) • Related Work at SSEC: • CrIS SDR Cal/Val and Characterization (Revercomb, IPO) • VIIRS SDR and Cloud Cal/Val (Menzel, IPO) • VIIRS Algorithm Assessment (Heidinger, IPO) • International Polar Orbiter Processing Package (Huang, IPO) • VIIRS Instrument Characterization (Moeller, NASA)

  4. Creating Climate Data Products (CDR) is hard! • Products track global trends • Calibration must be accurate. (No calibration artifacts) • Algorithms must be fully verified with global data (No regional artifacts) • Data sets are large and hard to manage • Developing the CDRs is an iterative process • Large processing clusters are required • Programming requires different skill set • Distributed systems hard to test • On going process • Requirements change • Technology changes • Staff changes

  5. The process requires multiple computing systems Single machine can be used for initial development but cluster computing needed to verify performance over full globe.

  6. CDR development is an iterative process Initial development occurs on single machine Product verification requires data sets of increasing size Increasing data set size increase computation time

  7. Strategies of processing simplification • Reduce or remove the “Move to Cluster” step • Make executions environments similar • Make data access patterns similar Results in faster iterations

  8. Strategies for managing processing system • Use well defined interfaces between subsystems • Decouples systems which reduces learning curve • Allows evolution of subsystems • Simplifies test and verification of software • Create configuration driven subsystems • Simplifies deployment of subsystems • Allows operations to modify system behavior • Leverage automated testing technologies • Reduces learning curve • Provides continuous test coverage • Captures requirements in executable form

  9. The system: Atmosphere PEATE • Ingest : ING • Brings data into the Atmosphere PEATE • Supports FTP, HTTP and RSYNC • Data Management System : DMS • Stores data in the form of files. • Provides a Web Service to locate, store and retrieve files. • Computational Resource Grid : CRG • Provides Web Service to locate, store and retrieve jobs • Algorithm : ALG • Consumes jobs • Runs algorithms in form of binaries • Algorithm Rule Manager: ARM • Combines data with algorithms to produce jobs • Provides Web Service interface to locate, store and retrieve rules

  10. ING: Ingest, bring data into system • Configuration File • Allows operations to add new sites • Allows operations to maintain existing sites • Customization allowed in form of scripts (BASH,PYTHON) • QC • Quick Look • Metadata extraction • Notices missing or late data

  11. DMS: Stores Data and Products • Relives Scientist of having to manage data. • Simple put and get functionality Configuration file • Specify fileservers and directories • Operations can Add/Remove fileservers File system - hold files Database - holds file information Public Access - DMS interface Worker - manages file system

  12. CRG : Provide nodes with jobs Provides well-defined interface deployed as a web service Accepts job requests Provides Job Status Monitors Job State Allows processing nodes to be added or removed from system

  13. AlgHost: Runs software the produces products • Recreates development environment • Retrieves data from DMS • Retrieves and runs software packages • Saves results to DMS, includes products, stdout and stderr

  14. Algorithm Script Structure • Cluster executes bash script • Script is passed arguments • Software Package Directory • Working / Output directory • Static Ancillary Directory • Dynamic Ancillary Directory • Inputs files • Outputs files • Software Package is called from the script • Results are stored by the process that started script.

  15. ARM: Bind data to software packages Provides well-defined interface deployed as a web service. Assigns jobs to CRG Monitors data in DMS Monitors the status of jobs in CRG Production rules can be added or removed dynamically by operations Volatile logic lives here

  16. Strategies for managing processing system (revisited) • Use well defined interfaces between subsystems • Decouples systems which reduces learning curve • Allows evolution of subsystems • Simplifies test and verification of software • Create configuration driven subsystems • Simplifies deployment of subsystems • Allows operations to modify system behavior • Leverage automated testing technologies • Reduces learning curve • Provides continuous test coverage • Captures requirements in executable form

  17. Development Process: Spiral method Design Implement Build = Deploy to Operations Test Deploy

  18. Testing Strategy • Employ standard software industry practices • Automate with ANT, Make like, XML based • Test with JUNIT, Java Unit Test • Increases system quality • Tests are reproducible • Tests are run more often than they would be if they were manual • Tests are improved over time • Tests are configurable • We don’t just build, the process includes testing and verification

  19. Nightly Build Builds system Tests subsystems Tests scenarios Updates repositories Logs results Scenarios demonstrate requirements

  20. Unit and Regression Testing May use internal knowledge interfaces for testing Test and exercise public interfaces Stress test interfaces Evolve to test and verify bugs Fixed defects have specific tests added Tests run in nightly build Tests verify release Layered approach to testing Everything tested, Every Night

  21. Testing Scenarios (1 of 2) Test ingest function Test forward and redo functions Reflect CDR development process

  22. Test Scenarios (2 of 2) • Documents • 3600-0003.080402.doc - Level 4 requirements • 3600-0004.060911.doc - Operations Concepts • Test plans are implemented as scenario tests • Tests correspond to Use Cases outlined in OpsCon • At least one test for each requirement set • Successful completion of test verifies requirements by demonstration • Factors that determine success • Generation of expected products • Ability to track product heritage • Ability to reproduce results • Ability to uniquely identify products

  23. Conclusion: Climate Data Processing Is Easy • Ingest system makes it easy to add and manage data sources • Operators can control system • Operator can monitor system • The DMS makes it easy to maintain large data sets • Scientists can find data • Operators can add and remove servers • Operators can add and remove sites • The CRG and AlgHost make it easy to transfer CDR production the development to the cluster environment • You still have to get the product correct!

More Related