Optimizing Retrieval Algorithms for Information Content Maximization

Information Content Tristan L’Ecuyer

Degrees of Freedom • Using the expression for the state vector that minimizes the cost function it is relatively straight-forward to show that where Im is the m x m identity matrix and A is the averaging kernel. • NOTE: Even if the number of retrieval parameters is equal to or less than the number of measurements, a retrieval can still be under-constrained if noise and redundancy are such that the number of degrees of freedom for signal is less than the number of parameters to be retrieved.

Entropy-based Information Content • The Gibbs entropy is the logarithm of the number of discrete internal states of a thermodynamic system where pi is the probability of the system being in state i and k is the Boltzmann constant. • The information theory analogue has k=1 and the pi representing the probabilities of all possible combinations of retrieval parameters. • More generally, for a continuous distribution (eg. Gaussian):

Entropy of a Gaussian Distribution • For the Gaussian distributions typically used in optimal estimation we have: • For an m-variable Gaussian dist.:

Information Content of a Retrieval • The information content of an observing system is defined as the difference in entropy between an a priori set of possible solutions, S(P1), and the subset of these solutions that also satisfy the measurements, S(P2): • If Gaussian distributions are assumed for the prior and posterior state spaces as in the O. E. approach, this can be written: since, after minimizing the cost function, the covariance of the posterior state space is:

Interpretation • Qualitatively, information content describes the factor by which knowledge of a quantity is improved by making a measurement. • Using Gaussian statistics we see that the information content provides a measure of how much the ‘volume of uncertainty’ represented by the a priori state space is reduced after measurements are made. • Essentially this is a generalization of the scalar concept of ‘signal-to-noise’ ratio.

Prior State Space 0.64 μm (H=1.20) LWP (gm-3) 0.64 & 2.13 μm (H=2.51) 17 Channels (H=3.53) LWP (gm-3) Re (μm) Re (μm) Liquid Cloud Retrievals • Blue  a priori state space • Green  state space that also matches MODIS visible channel (0.64 μm) • Red  state space that matches both 0.64 and 2.13 μm channels • Yellow  state space that matches all 17 MODIS channels

Measurement Redundancy • Using multiple channels with similar sensitivities to the parameters of interest merely adds redundant information to the retrieval. • While this can have the benefit of reducing random noise, it cannot remove biases introduced by forward model assumptions that often impact both channels in similar ways as well.

Channel Selection • The information content of individual channels in an observing system can be assessed via: where kj is the row of K corresponding to channel j. • The channels providing the greatest amount of information can then be sequentially selected by adjusting the covariance matrix via:

Method • Evaluate Sy • Compute K • Establish prior information • Evaluate the information content of each channel, Hj, with respect to the a priori, Sa • Select the channel that provides the most information and update the covariance matrix using the appropriate row of K • Recompute the information content of all remaining channels with respect to this new error covariance, S1 • Select the channel that provides the most additional information • Repeat this procedure until the signal-to-noise ratio of all remaining channels is less than 1:

Optimizing Retrieval Algorithms • GOAL: Select optimal channel configuration that maximizes retrieval information content for the least possible computational cost by limiting the amount of redundancy in the observations • APPROACH: Use Jacobian of the forward model combined with appropriate error statistics to determine the set of measurements that provides the most information concerning the geophysical parameters of interest for the least computational cost

Information Spectra • Relative to the a priori, the 11 μm channel provides the most information due to its sensitivity to cloud height and its lower uncertainty relative to the visible channels. • Once the information this channel carries is added to the retrieval, the I.C. of the remaining IR channels is greatly reduced and two visible channels are chosen next. IWP = 100 gm-2 Re = 16 μm Ctop = 9 km

Unrealistic Errors • When a uniform 10% measurement uncertainty is assumed, the visible/near-IR channels are weighted unrealistically strongly relative to the IR. IWP = 100 gm-2 Re = 16 μm Ctop = 9 km IWP = 100 gm-2 Re = 16 μm Ctop = 9 km 10 %

Thin Cloud (IWP = 10 gm-2) • For very thin clouds, the improved accuracy of IR channels relative to those in the visible increases their utility in the retrieval. IWP = 100 gm-2 Re = 16 μm Ctop = 9 km IWP = 10 gm-2 Re = 16 μm Ctop = 9 km

Larger Crystals (Re = 40 μm) • At large effective radii, both the visible and IR channels lose sensitivity to effective radius. Two IR channels are chosen primarily for retrieving cloud height and optical depth. IWP = 100 gm-2 Re = 16 μm Ctop = 9 km IWP = 100 gm-2 Re = 40 μm Ctop = 9 km

High Cloud (Ctop = 14 km) • The enhanced contrast between cloud top temperature and the surface increases the signal to noise ratio of the IR channels. IWP = 100 gm-2 Re = 16 μm Ctop = 9 km IWP = 100 gm-2 Re = 16 μm Ctop = 14 km

Optimizing Retrieval Algorithms for Information Content Maximization