Quantifying the skill of cloud forecasts from the ground and from space

Quantifying the skill of cloud forecastsfrom the ground and from space Robin Hogan JulienDelanoe, Ewan O’Connor, Anthony Illingworth, Jonathan Wilkinson University of Reading, UK

Other areas of interest Representing effects of cloud structure in radiation schemes • Horizontal inhomogeneity, overlap, 3D effects • Mixed-phase clouds • Why are they so poorly represented in models? • Convection • Estimating microphysical properties and fluxes of mass and momentum from observations

Overview • The “Cloudnet” processing of ground-based radar and lidar observations • Continuous evaluation of the climatology of clouds in models • Testing the skill of cloud forecasts from seven models • Desirable properties of skill scores; good and bad scores • Skill versus cloud fraction, height, scale, forecast lead time • Estimating the forecast “half life” • Cloud fraction evaluation using a spaceborne lidar simulator • Evaluation of ECMWF model with ICESat/GLAS lidar • Synergistic retrievals of ice cloud properties from the A-train • Variational methodology • Testing of the Met Office and ECMWF models

Project • Original aim: to retrieve and evaluate the crucial cloud variables in forecast and climate models • Seven models: 5 NWP and 2 regional climate models in NWP mode • Variables: cloud fraction, LWC, IWC, plus a number of others • Four sites across Europe: UK, Netherlands, France, Germany • Period: Several years, to avoid unrepresentative case studies • Ongoing/future work (dependent on sources of funding) • Apply to ARM data worldwide • Run in near-real-time for rapid feedback to NWP centers • Evaluate multiple runs of single-column versions of models

Level 1b • Minimum instrument requirements at each site • Cloud radar, lidar, microwave radiometer, rain gauge, model or sondes • Radar • Lidar

Level 1c • Instrument Synergy product • Example of target classification and data quality fields: Ice Liquid Rain Aerosol

Level 2a/2b • Cloud products on (L2a) observational and (L2b) model grid • Water content and cloud fraction L2a IWC on radar/lidar grid L2b Cloud fraction on model grid

Cloud fraction Chilbolton Observations Met Office Mesoscale Model ECMWF Global Model Meteo-France ARPEGE Model KNMI RACMO Model Swedish RCA model

NCEP over SGP in 2007 • Hot off the press! • Produced directly from ARM’s “ARSCL” product so could easily be automated • NCEP model appears to under-predict low and mid-level cloud

How skillful is a forecast? ECMWF 500-hPa geopotential anomaly correlation • Most model comparisons evaluate the cloud climatology • What about individual forecasts? • Standard measure shows ECMWF forecast “half-life” of ~6 days in 1980 and ~9 in 2000 • But virtually insensitive to clouds!

Raw (1 hr) resolution 1 year from Murgtal DWD COSMO model Joint PDFs of cloud fraction b a d c • 6-hr averaging …or use a simple contingency table

5 desirable properties of skill scores • “Equitable”: all random forecasts score zero • This is essential! • Note that forecasting the right climatology versus height but with no other skill should also score zero • “Proper”: not possible to “hedge your bets” • Some scores reward under- or over-prediction (e.g. hit rate) • Jolliffe and Stephenson: impossible to be equitable and strictly proper! • Independent of how often cloud occurs • Almost all scores asymptote to 0 or 1 for vanishingly rare events • Dependence on full joint PDF, not just 2x2 contingency table • Difference between cloud fraction of 0.9 and 1 is as important for radiation as a difference between 0 and 0.1 • “Linear”: so that can fit an inverse exponential • Some scores (e.g. Yule’s Q) “saturate” at the high-skill end

“Cloud” deemed to occur when cloud fraction f is larger than some threshold fthresh Possible skill scores • To ensure equitability and linearity, we can use the concept of the “generalized skill score” = (x-xrandom)/(xperfect-xrandom) • Where “x ” is any number derived from the joint PDF • Resulting scores vary linearly from random=0 to perfect=1

Possible skill scores • “Cloud” deemed to occur when cloud fraction f is larger than some threshold fthresh • To ensure equitability and linearity, we can use the concept of the “generalized skill score” = (x-xrandom)/(xperfect-xrandom) • Where “x ” is any number derived from the joint PDF • Resulting scores vary linearly from random=0 to perfect=1 • Simplest example: Heidke skill score (HSS) uses x=a+d • We will use this as a reference to test other scores • Brier skill score uses x=mean squared cloud-fraction difference, Linear Brier skill score (LBSS) uses x=mean absolute difference • Sensitive to errors in model for all values of cloud fraction

Some simpler scores H • Hit rate or Prob. of Detection: H=a/(a+c) • “Fraction of cloudy events correctly forecast” • E.g. Mace et al. (1998) for cloud occurrence • Problems • Not equitable • Easy to “hedge”: forecast cloud all the time guarantees a perfect score, so favours models that overpredict cloud • This is linked to its asymmetry • Log of Odds Ratio: LOR=ln(ad/bc) • E.g. Stephenson (2008) for tornado forecasts • Properties • Equitable • Not easy to hedge • Unbounded: a perfect score is infinity! LOR

Skill versus cloud-fraction threshold • Consider 7 models evaluated over 3 European sites in 2003-2004 HSS LOR • LOR implies skill increases for larger cloud-fraction threshold • HSS implies skill decreases significantly for larger cloud-fraction threshold

Extreme dependency score • Stephenson et al. (2008) explained this behavior: • Almost all scores have a meaningless limit as “base rate” p  0 • HSS tends to zero and LOR tends to infinity • They proposed the Extreme dependency score: • where n = a + b + c + d • It can be shown that this score tends to a meaningful limit: • Rewrite in terms of hit rate H =a/(a +c) and base rate p =(a +c)/n : • Then assume a power-law dependence of H on p as p  0: • In the limit p  0 we find • This is meaningful because random forecasts have Hit rate converging to zero at the same rate as base rate: d=1 so EDS=0 • Perfect forecasts have constant Hit rate with base rate: d=0 so EDS=1

Symmetric extreme dependency score • Problems with EDS: • Easy to hedge by predicting cloud all the time so c =0 • Not equitable • These are solved by defining a symmetric version: • All the benefits of EDS, none of the drawbacks! Hogan, O’Connor and Illingworth (2009, submitted to QJRMS)

Skill versus cloud-fraction threshold HSS LOR SEDS • SEDS has much flatter behaviour for all models (except for Met Office which underestimates high cloud occurrence significantly)

Skill versus height • Most scores not reliable near the tropopause because cloud fraction tends to zero SEDS LBSS • New score reveals: • Skill tends to slowly decrease at tropopause • Mid-level clouds (4-5 km) most skilfully predicted, particularly by Met Office • Boundary-layer clouds least skilfully predicted HSS LOR

A surprise? • Is mid-level cloud well forecast??? • Frequency of occurrence of these clouds is commonly too low (e.g. from Cloudnet: Illingworth et al. 2007) • Specification of cloud phase cited as a problem • Higher skill could be because large-scale ascent has largest amplitude here, so cloud response to large-scale dynamics most clear at mid levels • Higher skill for Met Office models (global and mesoscale) because they have the arguably most sophisticated microphysics, with separate liquid and ice water content (Wilson and Ballard 1999)? • Low skill for boundary-layer cloud is not a surprise! • Well known problem for forecasting (Martin et al. 2000) • Occurrence and height a subtle function of subsidence rate, stability, free-troposphere humidity, surface fluxes, entrainment rate...

Skill versus lead time 2007 2004 • Only possible for UK Met Office 12-km model and German DWD 7-km model • Steady decrease of skill with lead time • Both models appear to improve between 2004 and 2007 • Generally, UK model best over UK, German best over Germany • An exception is Murgtal in 2007 (UK model wins)

Forecast “half life” Met Office DWD 2007 2004 3.0 d • Fit an inverse-exponential: • S1 is the score after 1 day and t1/2 is the half-life • Noticeably longer half-life fitted after 36 hours • Same thing found for Met Office rainfall forecast (Roberts 2008) • First timescale due to data assimilation and convective events • Second due to more predictable large-scale weather systems 2.7 days 2.6 days 3.2 d 3.1 days 2.9 days 4.0 days 2.7 days 3.1 d 2.9 days 2.4 days 4.3 days 2.9 days 4.3 days 2.7 days

Why is half-life less for clouds than pressure? • Different spatial scales? Convection? • Average temporally before calculating skill scores: • Absolute score and half-life increase with number of hours averaged

Geopotential height anomaly Vertical velocity • Cloud is noisier than geopotential height Z because it is separated by around two orders of differentiation: • Cloud ~ vertical wind ~ relative vorticity ~ 2streamfunction ~ 2pressure • Suggests cloud observations should be used routinely to evaluate models

Alternative approach • How valid is it to estimate 3D cloud fraction from 2D slice? • Henderson and Pincus (2009) imply that it is reasonable, although presumably not in convective conditions • Alternative: treat cloud fraction as a probability forecast • Each time the model forecasts a particular cloud fraction, calculate the fraction of time that cloud was observed instantaneously over the site • Leads to a Reliability Diagram: Perfect No resolution No skill Jakob et al. (2004)

Satellite observations: IceSAT • Cloud observations from IceSAT 0.5-micron lidar (first data Feb 2004) • Global coverage but lidar attenuated by thick clouds: direct model comparison difficult Lidar apparent backscatter coefficient (m-1 sr-1) Latitude Optically thick liquid cloud obscures view of any clouds beneath Solution: forward-model the measurements (including attenuation) using the ECMWF variables

ECMWF cloud fraction after processing IceSAT cloud fraction ECMWF raw cloud fraction Simulate lidar backscatter: • Create subcolumns with max-rand overlap • Forward-model lidar backscatter from ECMWF water content & particle size • Remove signals below lidar sensitivity

Global cloud fraction comparison ECMWF raw cloud fraction • Results for October 2003 • Tropical convection peaks too high • Too much polar cloud • Elsewhere agreement is good • Results can be ambiguous • An apparent low cloud underestimate could be a real error, or could be due to high cloud above being too thick ECMWF processed cloud fraction IceSAT cloud fraction Wilkinson, Hogan, Illingworth and Benedetti (MWR 2008)

Testing the model climatology Error due to uncertain extinction-to-backscatter ratio Reduction in model due to lidar attenuation

Testing the model skill from space Tropical skill appears to peak at mid-levels but cloud very infrequent here Clearly need to apply SEDS to cloud estimated from lidar & radar! Highest skill in north mid-latitude and polar upper troposphere Unreliable region Is some of reduction of skill at low levels because of lidar attenuation? Lowest skill: tropical boundary-layer clouds

Ice cloud retrievals from the A-train • Advantages of combining radar, lidar and radiometers • Radar ZD6, lidar b’D2 so the combination provides particle size • Radiances ensure that the retrieved profiles can be used for radiative transfer studies • How do we do we combine them optimally? • Use a “variational” framework: takes full account of observational errors • Straightforward to add extra constraints and extra instruments • Allows seamless retrieval between regions of different instrument sensitivity • Retrievals will be compared to Met Office and ECMWF forecasts under the A-train

Formulation of variational scheme Ice visible extinction coefficient profile Attenuated lidar backscatter profile Ice normalized number conc. profile Radar reflectivity factor profile (on different grid) Extinction/backscatter ratio for ice Infrared radiance (TBD) Liquid water path and number conc. for each liquid layer Visible optical depth Radiance difference (TBD) Aerosol visible extinction coefficient profile For each ray of data we define: • Observation vector • State vector • Elements may be missing • Logarithms prevent unphysical negative values

Solution method New ray of data Locate cloud with radar & lidar Define elements of x First guess of x • An iterative method is required to minimize the cost function Forward model Predict measurements y from state vector x using forward modelH(x) Predict the JacobianH=yi/xj Gauss-Newton iteration step Predict new state vector: xk+1= xk+A-1{HTR-1[y-H(xk)] -B-1(xk-b)-Txk} where the Hessian is A=HTR-1H+B-1+T No Has solution converged? 2 convergence test Yes Calculate error in retrieval Proceed to next ray

CloudSat-CALIPSO-MODIS example • Lidar observations • Radar observations • Radar forward model 1000 km

Lidar observations Lidar forward model Radar observations Radar forward model CloudSat-CALIPSO-MODIS example

Extinction coefficient Ice water content Effective radius Radar-lidar retrieval Forward model MODIS 10.8-mm observations

Radiances matched by increasing extinction near cloud top …add infrared radiances Forward model MODIS 10.8-mm observations

Radar-lidar complementarity MODIS 11 micron channel Retrieved extinction (m-1) CloudSat radar Deep convection penetrated only by radar Height (km) CALIPSO lidar Cirrus detected only by lidar Height (km) Mid-level liquid clouds Time since start of orbit (s)

Comparison with ECMWF log10(IWC[kg m-3])

Global forecast model data extracted underneath A-Train A-Train ice water content averaged to model grid Met Office model lacks observed variability ECMWF model has artificial threshold for snow at around 10-4 kg m-3 Comparison with model IWC A-Train Met Office ECMWF Temperature (°C) Temperature (°C)

Summary and outlook • Defined five key properties of a good skill score • Plenty of bad scores are used (hit rate, false-alarm rate etc) • New “Symmetric extreme dependency score” is equitable and nearly independent of the occurrence of the quantity being forecast • Model comparisons reveal • Half-life of a cloud forecast is between 2.5 and 4 days, much less than ~9 days for ECMWF 500-hPa geopotential height forecast • Longer-timescale predictability after 1.5 days • Higher skill for mid-level cloud and lower for boundary-layer cloud • Proposal submitted to apply some of these metrics (including probabilistic ones) to NWP & single-column models over the ARM sites • Further work with radar-lidar-radiometer retrieval • Being used to test new ice cloud scheme in ECMWF model, as well as high-resolution simulations of tropical convection in “Cascade” project • Retrieve liquid clouds and precipitation at the same time to provide a truly seamless retrieval from the thinnest to the thickest clouds • Adapt for EarthCARE satellite (ESA/JAXA: launch 2013)

Cloud fraction in 7 models • Uncertain above 7 km as must remove undetectable clouds in model • All models except DWD underestimate mid-level cloud • Some have separate “radiatively inactive” snow (ECMWF, DWD); Met Office has combined ice and snow but still underestimates cloud fraction • Wide range of low cloud amounts in models • Not enough overcast boxes, particularly in Met Office model • Mean & PDF for 2004 for Chilbolton, Paris and Cabauw 0-7 km Illingworth et al. (BAMS 2007)

Contingency tables Model cloud Model clear-sky Comparison with Met Office model over Chilbolton October 2003 Observed cloud Observed clear-sky

Monthly skill versus time • Measure of the skill of forecasting cloud fraction>0.05 • Comparing models using similar forecast lead time • Compared with the persistence forecast (yesterday’s measurements) • Lower skill in summer convective events

Why N0*/a0.6? • In-situ aircraft data show that N0*/a0.6 has temperature dependence that is independent of IWC • Therefore we have a good a-priori estimate to constrain the retrieval • Also assume vertical correlation to spread information in height, particularly to parts of the profile detected by only one instrument

Why N0*??? • We need to be able to forward model Z and other variables from x • Large scatter between extinction and Z implies 2D lookup-table is required • When normalized by N0*, there is a near-unique relationship between a/N0* and Z/N0* (as well as re, IWC/N0* etc.)

Ice cloud: non-variational retrieval Aircraft-simulated profiles with noise (from Hogan et al. 2006) Donovan et al. (2000) • Donovan et al. (2000) algorithm can only be applied where both lidar and radar have signal Observations State variables Derived variables Retrieval is accurate but not perfectly stable where lidar loses signal

Variational radar/lidar retrieval • Noise in lidar backscatter feeds through to retrieved extinction Observations State variables Derived variables Lidar noise matched by retrieval Noise feeds through to other variables

Quantifying the skill of cloud forecasts from the ground and from space

Quantifying the skill of cloud forecasts from the ground and from space

Presentation Transcript

DISLOCATIONS FROM THE GROUND UP

From the Ground Up and from the Inside Out

Advance of Computing From the Ground to the Cloud

SEO From the ground up

A View From the Ground

Cloud Computing: A View from the Ground

Building from the Ground Up

From the Ground Up

Empires Arise from the ground

“From the Ground, Up”

Onions From the Ground Up

BUILDING – FROM THE GROUND UP

From the Ground Up

“The View From Space”

Notes from the (Under)Ground

Ground based evaluation of cloud forecasts.

Advocacy From The Ground Up

Quantifying the skill of cloud forecasts from the ground and from space

High Energy Gamma Astronomy from space and from ground in the past and the forthcoming decades

Microlensing Planets from the Ground and Space

The Cat from Space

DevOps from the Ground Up