Developing JWST Pipelines at STScI

Developing JWST Pipelinesat STScI Robert Jedrzejewski

Who we are • The Science Software Branch at STScI • 16 members • Most have an astronomy background • 6 have PhDs • Combined experience in group: 125 years • Combined experience at STScI: 200 years

What we do • Develop HST calibration pipelines • STSDAS/TABLES • PyRAF, PyFITS, STScI_Python • HST Exposure Time Calculators • Other smaller projects (Gemini/GOODS/Hubble Legacy Archive/GoogleSky/JWST Backplane Stability…)

Development Experience • Python • Java • C/C++ • Fortran • spp/cl • IDL • (Perl/Assembly/Tcl…)

Our preferred development model • Python! • We find we can be extremely productive writing in Python • Speed is occasionally an issue, so we use C extensions when necessary • Very little pipeline code requires performance optimization

Development style • Use version control (subversion) • Use regression tests + nightly builds + web reporting tools • Trac for problem tracking/wiki for information dissemination • Unit/doc tests • Multiple platforms (Linux/Mac/Solaris/Windows)

How we did HST pipelines • Calfoc, calfos, calhrs, calwfpc, calwp2 • First generation pipelines, written in spp, read GEIS files • Calstis, calnic(a/b) • Second generation, written in C using hstio (which wraps IRAF imio libraries) to read multiple extension FITS files • Calacs • Borrowed much code from calstis imaging • Calwfc3 • Borrowed much code from calacs, calnic • Calcos • Third generation, written in Python (+ c where needed) • Later pipelines were more likely to be used by IDTs for calibrating ground test data

More on HST pipelines • Pipeline operation is data-driven • Calibration steps as header keywords: • FLATCORR=PERFORM/OMIT/COMPLETE/SKIPPED • Reference file names as header keywords • FLATFILE=oref$g2342212_flt.fits • This decouples some of the intelligence from the code • No need to rebuild code if step or reference file changes

Multidrizzle • Multidrizzle is used by the ACS and WFPC2 pipelines to combine images with small position offsets (dithered), removing cosmic rays • It is a Python application that can be used with ACS, STIS, WFPC2, NICMOS and WFC3 data • This breaks from our ‘tradition’ of having 1 calibration pipeline program for each instrument

Input stage Calibration Step Reference File Output stage How we see the JWST Pipelines • A series of calibration steps

Early design ideas • No need to have separate pipeline programs for each JWST instrument • Many calibration steps depend on detector, and JWST instruments use detectors of the same type • We can use the same code, instead of having to replicate it (and maintain it) in more than one place • Some calibration steps will probably be identical for all JWST data (e.g. the MASKCORR step, where a static mask from a reference is applied to the DQ array of the data)

Try not to make the mistakes we made with HST • Use the same keywords for the same quantities • Use the same file/association structure • Use the same algorithms to do the same calibration • Unless a team shows that a given algorithm does not work for their instrument • Even then, try and keep as much code common as possible, only breaking out the code that is different • Sometimes it is possible to encapsulate the differences in the reference files, keeping the code the same

JWST Pipelines (continued…) • Python gives us object-oriented capabilities • ‘input_stage’ and ‘output_stage’ are objects that encapsulate information on their state and on how to calibrate themselves • For example, they might be NIRSPEC IFU data objects, or MIRI imaging data objects • When executing a given step, they may use their own custom method, or else defer to a method that they inherit from a more ‘generic’ datatype • E.g. MIRI imaging data and NIRCAM imaging data may both use the flatfield() method of the JWSTImagingData class, from which they both inherit

JWST Pipelines (continued…) • The inheritance hierarchy encapsulates information about what is the same and what is different about JWST data types • We can mix in behaviors from different types of object, as necessary • But, to the extent that is possible, we try and keep as much the same as possible • The people who inherit this project will thank us

What goes in? • IDTs and instrument teams at STScI will figure out: • Which steps are needed, and their ordering • Which instruments/modes use the steps • What each step does • What calibration reference data are needed • What tests the code needs to pass

Facilitating the process • Calibration data will be in a “public” repository • This will include: • Code • Test data • Documentation

Facilitating… • We will encourage everyone to try out our algorithms as we develop them • And we encourage everyone to contribute their own algorithms • We’ll handle keeping teams synchronized by versioning and providing different builds • E.g. Team A may still be testing build X, when team B needs to test the next stage of functionality in build X.1 • When Team B is ready to test the functionality in build X.1, there may already be build X.2 (which includes the functionality in build X.1 as well as new functionality) • In the end, all the teams will test the same code

Facilitating • How do we know that the code does the ‘right’ thing? • Teams provide test data with test results • Then we know that the result is correct because it reproduces team-supplied answers • Test results could be actual data (e.g. FITS files) • Pixels in pipeline-calibrated data should be identical within +/- • Or results of analysis • Aperture photometry should be the same to within +/-

Interfacing with other languages • If teams develop code that does a lot of fancy processing, we can try to include it by wrapping • Python talks to C/C++ using C extensions • An existing C function can be wrapped so that Python objects can be passed to C/C++, and C objects passed back to Python • We can wrap relatively simple C functions • Arguments are arrays or primitive datatypes (integer/float/string…) • No objects as arguments • Structs are OK, as long as they are simple (flat) • Play nice with memory

Wishlists • We don’t need to feel constrained by HST • What are the biggest deficiencies in HST? • Best reference files and best calibration steps can be determined by querying a service • Don’t need to rely on HST archive to find these out • Reference files can be downloaded as needed • Even calibration code can be updated as needed (don’t need to wait 6 months for the next STSDAS release)

Wishlists • Tell us what you want! • The earlier the better • Some aspects of the overall architecture are still flexible • And not just pipeline calibration code • We are going to need tools for data analysis, evaluation, interpretation, visualization • Reference file generation

Developing JWST Pipelines at STScI