Observational procedures and data reduction Lecture 4: Data reduction process

Observational procedures and data reduction Lecture 4: Data reduction process XVII Canary Islands Winter School of Astrophysics: ‘3D Spectroscopy’ Tenerife, Nov-Dec 2005 James E.H. TurnerGemini Observatory

Overview • The last lecture gave an overview of the different data reduction stages and what is involved at each step • This lecture briefly discusses the reduction as process: • Error and data quality propagation • File formats • A couple of example reduction sequences

Error propagation • At the end of the reduction, it’s important to have a good estimate of the errors in data values • For faint sources, we can estimate the statistical significance of detection • Want a measure of the reliability of ages, metallicities etc. derived from line strength indices (Cardiel et al., 1997) or the intrinsic random errors in velocity measurement • etc. • The raw data have quite well-defined errors due to photon statistics and read noise • After numerous processing stages, it is difficult at best (impossible at worst) to estimate errors directly from the data values

Error propagation • Solution • Keep track of the errors in data values throughout the processing • For each detector pixel, store an error value in a separate error image, alongside the main science data array • During each processing step, process the error array in parallel with the science image, to reflect how the errors have changed • For example, when adding two science images, add the corresponding error images in quadrature

Error propagation • Poisson statistics • A process where discrete values vary statistically around a well-defined mean, eg. counting photons, is described by a Poisson distribution: • with a mean (expected number of photons) of n=m • The standard deviation from the mean is simply s=√m • So when counting photons (really electrons), the statistical error is the square root of the expected number of photons (electrons) • In practice, estimate the error as the square root of the measured number of electrons, since that is what we know • For large m, the Poisson distribution is a Gaussian disribution with s=√m

Error propagation • Random sources of measurement error (noise) in the data • Detector read noise • Poisson noise from the science target and sky • Poisson noise from detector dark current • Also have systematic errors introduced during processing • Eg. due to inaccuracies in flat fielding • Usually present at the level of a few percent; difficult to reduce to zero • These effects can be more difficult to account for, but typically the statistical errors are dominant • If we get enough signal-to-noise with an IFU to worry about errors of a few percent, we’re usually going to be pretty happy! } From the pixel values

Error propagation • Detectors don’t usually report exactly 1 count per stored electron • Poisson statistics apply to electrons, rather than detector counts (ADUs) • The detector ‘gain’ equals the number of electrons per measured count • Really an inverse gain, but that’s what it’s called! • Controls how much light can be measured before saturating • Typical gains are a few e-/ADU (CCDs), up to >10 e-/ADU (NIR) • To estimate Poisson noise in electrons, multiply the counts by the gain and take the square root • When adding values, their errors add in quadrature (sum of squares) • Therefore when propagating errors, we use the error array to store variance (s2) values, rather than the actual noise (s)

Error propagation • Error propagation procedure • Start by creating a variance array containing the square of the detector read noise, sread2, which affects every pixel independently of the counts • Read noise is counted in electrons, so if we are storing science data values as detector counts, the variance should be (sread/gain)2 • Alternatively, multiply the science array through by the gain to begin with • Estimate the statistical variance in the measured counts, n, for each pixel and add to the array of read noise values • If working in electrons the statistical variance to add is just n gain • In ADUs, the number is n / gain

Error propagation • At each subsequent reduction step, manipulate the variance array according to the operation being performed on the science array: • When adding or subtracting images, their errors add in quadrature • Simply add the variance arrays for each image • When scaling an image (multiplying or dividing by a number), the error is scaled accordingly • Multiply the variance by the square of the scaling factor • When multiplying/dividing images, their fractional errors add in quadrature • Divide each input variance array by the square of the corresponding image, add the results together and multiply by the square of the final science image to get the final variance image

Error propagation • For more complicated operations on the science data, ie. some arbitrary function, f(n) • Take the first derivative of f(n), to estimate how the output values vary with small changes in the input values • Multiply the variance by |df/dn| at the appropriate value of n • At the end of the data reduction process, can take the square root of the variance array to get to the final noise values • Resampling • In the raw data, each pixel has an independent statistical error • If resampling causes smoothing, the errors in different pixels may become correlated

Error propagation • One could attempt to propagate a separate covariance matrix Covariance = expected value of the product of deviations from the means • Usually software doesn’t track covariance, but it’s important to be aware that the variance numbers may not be exactly correct after resampling • Eg. linear interpolation at the midpoint between 2 samples is an average • The error on the result is therefore reduced by √2 • The number of pixels hasn’t changed, but each pixel has higher S/N! • However, summing 2 of the resampled pixels does not reduce the error by a further factor of √2 because the errors are no longer independent

Data quality • As well as storing variance values alongside each science image, it is useful to store data quality information • Use an integer valued array to flag which pixels are good, bad, noisy etc. in the main science array • Each bit of the integer represents yes/no for a particular defect, allowing more than one problem to be recorded for a particular pixel • Different pixel values indicate, for example: • Good pixel • Cosmic ray • Saturated pixel • Hot pixel (etc…) • The convention for the values depends on the processing software • Useful for masking out values appropriately at each reduction stage

File storage format • Data are typically stored in FITS files • Flexible Image Transport System, overseen by a NASA technical panel • Standard definition document available at http://fits.gsfc.nasa.gov/ • Each single FITS file can contain • One or more N-dimensional image arrays • ASCII header information, using keyword = value pairs • Header keywords can have values of different data types • Eg. OBJECT = ‘NGC1068’ or EXPTIME = 120 • One or more binary tables • Using named columns (eg. XCOORD, YCOORD) and mixed data types, rather than a simple array of numbers • Other, less common formats of data

File storage format • Within a FITS file, data can be divided into separate extensions • The primary header contains keywords relevant to the whole file • Eg. object name, telescope pointing, airmass, filter, central wavelength • Each image, binary table etc. has its own numbered/named extension • Contains both the data and any extra header keywords that are only relevant to that dataset • Example FITS file structure during processing: • EXT# EXTTYPE EXTNAME EXTVE DIMENS BITPI INH OBJECT 0 trnS20040409S0166_s 16 Galaxy 1 BINTABLE MDF 32x21 8 2 IMAGE SCI 1 32x1022 -32 F Galaxy3 IMAGE VAR 1 32x1022 -32 F Variance4 IMAGE DQ 1 32x1022 32 F DQ5 IMAGE SCI 2 32x1022 -32 F Galaxy 6 IMAGE VAR 2 32x1022 -32 F Variance7 IMAGE DQ 2 32x1022 32 F DQ[ … etc … ]

Reduced data formats • Row-stacked spectra (etc.) • One option is just to work with extracted spectra in 2D • Limited to spectral analysis (eg. velocity measurement, not imaging) • Still have to create a 2D spatial map from the results afterwards • Datacube • A 3D image array, with two spatial axes and one wavelength axis • Easy to read and manipulate in IRAF, IDL, Python etc. • Usually requires resampling the processed IFU data onto a 3D grid • Except for IFUs that have a square lens grid to begin with • If we want to oversample after interpolating, to produce ‘smoother’ images (good for visualization etc), the file sizes can become quite large

Reduced data formats • Euro3D format • Both image data (1D spectra) and information describing the spectra are stored in a binary table • Native format for the ‘E3D’ visualization tool (can also read datacubes) • Closer to the raw data than a cube—attempts to avoid resampling until it is necessary, during visualization or analysis • Minimal file size, like row-stacked spectra, since there is no interpolation until it is needed • Requires having special software/libraries to work with the format

Reduced data formats

Example reduction sequence—optical • GMOS IFU (optical fibre) data, using the Gemini IRAF package

Example reduction sequence—optical

Example reduction sequence—infrared • GNIRS IFU (image slicer) data, using the Gemini IRAF package

Example reduction sequence—infrared

Summary • The data reduction process is typically based on FITS files, with one or more image extensions • Propagating error and data quality arrays through the process is helpful for understanding how accurate the results are • The final data format for analysis depends on the application, software, user preference etc. • Euro3D format, datacubes or in some cases just row-stacked spectra • The example reduction sequences for optical fibre data and NIR image slicer data give an idea of how the steps are ordered for science data

Observational procedures and data reduction Lecture 4: Data reduction process

Observational procedures and data reduction Lecture 4: Data reduction process

Presentation Transcript

Chapter 2: Data Preprocessing

Dimension Reduction in Workers Compensation

Dimension Reduction - PCA

GOES-R Risk Reduction Annual Meeting 2011

Scalable High Performance Dimension Reduction

GMOS Data Reduction

SEL3053: Analyzing Geordie Lecture 9. Dimensionality reduction 1

Self organizing maps A visualization technique with data dimension reduction

Dimension reduction (1)

Generating and Using Data for Poverty Reduction Strategies

Testing a New Astronomical Data Reduction Technique on ER CEPHEIUS

Non-linear dimension-reduction methods

A Graphical User Interface for NIRC2 Asteroid Data Reduction

Physics 164/238: Observational Astronomy Research Lab

NIFS Data Reduction

Data Preprocessing

Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction

Covering Trains by Stations or The power of Data Reduction

NIFS Data Reduction

Synchrotron SOLEIL

SMART Space and airborne Mined Area Reduction Tools