EPS Output

EPS Output A Forecaster’s Approach EPS Training Edmonton

Outline • EPS problems for the meteorologist • A simple conceptual model • Re-phrasing what we did yesterday • Ensemble examples • Uncertainty • Clustering using Principal Component Analysis EPS Training Edmonton

Where does the MT fit? • Project Phoenix has demonstrated that by focusing on meteorology and not on models in the first 18 to 24 hours, it is very easy to show huge improvements over first-guess SCRIBE forecasts. • The impact on day 2 is uneven. • How do we determine the point where the forecaster’s analysis and diagnosis no longer adds value? EPS Training Edmonton

Find the ensemble of the day • Already have trouble marrying reality and model outputs from a handful of models after that initial time. • What do we do when confronted with output from 10, 20, 100 ensembles? • Kain et al (2002) showed that forecasters may not have a lot skill at determining the “model of the day”. • How does the forecaster, if this is true, decide on which of potentially dozens of ensembles to select? EPS Training Edmonton

Information Bottleneck • Front end • Vast amounts of output that must be disseminated, visualized, analyzed, … • Back end • Once WE know what’s going on, how do we express that to the public? • WeatherOffice • Public forecast • SCRIBE EPS Training Edmonton

Do Users Want Determinism? • We assume that users want uncertainties spelled out in the forecast. • What if all they want is to know whether it’s going to rain tomorrow? Can I go to the beach EPS Training Edmonton

A New Tool • When you get a new tool, the first place you go is the owner’s manual • There isn’t one for EPS. • We need to write one. • That means that the meteorologists MUST get involved. • This is not just a Services issues • We are users of these outputs • Just as public clients are consulted, so should the meteorologists EPS Training Edmonton

A Thought Experiment Take a bag and put in ten pieces of paper, each numbered 1 through 10. Ask ten people to draw a piece of paper from the bag, but before they do so, ask them what number they think they’ll draw. EPS Training Edmonton

A Thought Experiment If 5 out of the 10 say that they think the number will be 3, does that mean that there’s a 50% chance that the number drawn will be 3? EPS Training Edmonton

Real Space, R Model Space, M Model Space vs. Real Space The forecaster’s role: evaluate then take the necessary steps to maximize that area. EPS Training Edmonton

Model Space vs. Real Space • Reliability is desired • It cannot be assumed • Links between the two spaces have to be forged • Statistical post-processing • Based on past performance. • Past performance does not necessarily extend to the current situation. • Analysis and diagnosis EPS Training Edmonton

An Example EPS Training Edmonton

A Joke… Did you hear the joke about the lost Swiss mountaineers. Completely confused, they reach the top of a peak and one of them takes out his map and compass and triangulates on three nearby peaks. One of his partners anxiously asks him, "Do you know where we are?" "Yes," says the triangulator. "See that mountain over there? We're right on top of it." If the model and reality disagree, it might be a good idea to go with reality EPS Training Edmonton

The whole basis for creating EPS in the first place is the notion that when you perturb the model’s initial conditions, play with its physics and parameterizations, and alter boundary conditions, if there are any, you get different solutions from the model. In deterministic modeling there are no other solutions. You get one to work with. The distribution of the solutions is a delta function. EPS Training Edmonton

Better solution Our Solution? Our Solution? Our Solution? The solution PDF In reality, there are an infinite number of solutions that fall into some unknown distribution. We don’t know it modality, its height and width, whether it’s skewed or not. This distribution changes from model run to model run and at each step down the timeline. We don’t know where our one deterministic solution fits within this distribution. We assume that it’s in a favourable part of the distribution, but that need not be the case. There is no reason that reality must appear within this distribution. We hope that it will because our models are pretty good, but it doesn’t have to!! EPS Training Edmonton

Sampling the underlying PDF • That’s what we’re attempting to do with EPS: sample the underlying distribution. If we can capture the nuances of the underlying distribution by generating multiple solutions, we can make some statements about probabilities and uncertainties. • Only about the solutions, though. We can say nothing about reality!! EPS Training Edmonton

Some Statistics • Consider a random sample taken from an unknown distribution. It turns out that the maximum likelihood estimator for the mean is the sample’s mean. • The sample of the underlying PDF represented by the ensembles is not random, yet research has shown that, over time, the ensemble mean is the better solution. • The maximum likelihood estimator for the variance is proportional and very nearly equal to the sample variance, though it tends to under-forecast the true variance. • The ensemble spread tends to be under-dispersive, behavior we expect from the sample variance. EPS Training Edmonton

Ensemble Pathways (Modes) Think of ensemble solutions as pathways down the timeline. When all the solutions are tightly packed (i.e. they have a low variance) we can say that the ensembles are favoring a single pathway; the individual ensembles are moving down the same path but some move down the centre of the path, some down the right side, some down the left, and some meander along it. If all the ensemble members follow the same path, we can say that there is a 100% probability that the real solution is following the same path. EPS Training Edmonton

The Fork in the Road What happens when the paths branch? What if 9 members of a 10 member ensemble go down the right-hand path and only 1 goes down the left? There’s a 90% chance of the model solution going down the right path, and a 10% chance of it going down the left. The trap waiting for the forecaster is that he may well take the most simplistic option, blindly following the right path because more of the ensembles are taking it, when in fact, the outlier on the left path might be the most interesting simply because of its extreme nature. EPS Training Edmonton

The River Delta Now imagine the case where each ensemble follows a different path, like a river delta. Each ensemble, no matter how extreme, has an equal chance of being the correct one. This is the rub for the forecaster. On any given day, each ensemble member has same probability of occurring as the others. They are all based on the same rules of physics. It is only by looking at their output in terms of pathways that we can realistically talk about probabilities. EPS Training Edmonton

EPS Training Edmonton

From Biswas et al, 2006 Hurricane Katrina Costliest and one of the five deadliest hurricanes First landfall near the border of Miami-Dade county and Broward county Final landfall near Louisiana / Mississippi border Around 1400 fatalities EPS Training Edmonton

Usefulness of the Ensemble Spread • If you watch charts of the ensemble spread, a pattern emerges: a lot of the spread occurs in areas where we know that models will have difficulty • Strong gradients • Rapidly moving systems • Essentially any area with strong spatial or temporal gradients. EPS Training Edmonton

Uncertainty • Without the assumption of reliability… • Uncertainty is really the degree of agreement, or the lack thereof, among the various ensembles. • From the pathway POV, the more pathways that exist through model space, the more we are unsure of what the model is really telling us. • Uncertainty is then measured by the pathway spread and the probability that the pathway will be well traveled EPS Training Edmonton

10 Member Ensemble EPS Training Edmonton

Where do we add value? EPS Training Edmonton

Managing the data stream • SPC meteorologists have a tremendous workload. • In PNR, we forecast for 52% of the country • This area gets more severe weather than almost all the other regions combined. • We start with the worst SCRIBE forecasts in the country • We do it with 2 people sliding, one in Winnipeg and the other in Edmonton. • How can we successfully integrate EPS output into the SPC, given its high maintenance, when workloads are already so high? EPS Training Edmonton

Reducing Dimensionality • Many statistical methods for accomplishing this • Cluster Analysis • Tubing • Bayesian Techniques • Factor Analysis • Principle Component Analysis • While they use different approaches, they all attempt to identify statistically significant pathways, or modes EPS Training Edmonton

Principle Component Analysis • Definition: a procedure for transforming a set of correlated variables into a new set of uncorrelated variables. This transformation is a rotation of the original axes to new orientations that are orthogonal to each other. The blue lines are the two principle components. Note that they are orthogonal to each other EPS Training Edmonton

How do we calculate them? • To find the principle components in any dataset, you need to • find the Eigenvalues and Eigenvectors of its covariance or correlation matrix • The Eigenvectors and their individual factor loadings define how to transform the data from x, y to the new coordinate system. EPS Training Edmonton

Eigenvalues and Eigenvectors • Consider the square matrix A . We say that λ is an eigenvalue of A if there exists a non-zero vector x such that Ax = λx. In this case, x is called an eigenvector (corresponding to λ), and the pair (λ ,x) is called an eigenpair for A. EPS Training Edmonton

What Kind of Matrix? • The matrix we use for calculating the eigenvectors and eigenvectors can be a number of different things • A matrix of correlation coefficients • A matrix of covariances • I construct a covariance matrix. • The matrix gives a measure of the how interrelated the members are. • The matrix is real and symmetric • Element (1,2) is equal to element (2,1) and so-forth • The diagonals are variances of each member • The size of the matrix is the number of ensembles EPS Training Edmonton

Variance and Covariance The variance is really a special case of the covariance and is the covariance of a variable with itself EPS Training Edmonton

Once the Eigenvalues and Eigenvectors are calculated • The Eigenvectors and their individual factor loadings define how to transform the data from x, y to the new coordinate system. • We rank the Eigenvectors in order of decreasing Eigenvalue • The Eigenvector with the highest Eigenvalue gives the first principle component, the next highest gives us the second PC, etc. • The Eigenvalues are also the variances of the observations in each of the new coordinate axes. EPS Training Edmonton

What we end up with … • We've extracted a set of principle components from our ensemble output • These are orthogonal and are ordered according to the proportion of the variance of the original data that each explains. • The goal is to reduce the dimensionality of the problem by retaining a (small) subset of factors. • The remaining factors are considered as either irrelevant or nonexistent (i.e. they are assumed to reflect measurement error or noise). EPS Training Edmonton

PC Retention • The number of PC's to retain is a non-trivial exercise and there is no single method that is entirely successful. • Retaining too few PC's results in under-factoring and a loss of signal. • Retain too many and noise creeps back in (under-filtering) and you also increase computation times. • Keeping in mind that the simplest approach is often the best, I use the Kaiser/Guttman criterion.. • The normalized eigenvalue should be between 0 and n (the number of members in the ensemble). Since we cannot reduce the dimensionality of the problem to anything less than 1, we use this as the criteria: we retain only those PC's that have eigenvalues > 1. • Each PC can be thought of as a pathway through model space. • The amount of variance explained by each component gives us a measure of how well traveled the path is. • It also provides a measure of when we need to move from a deterministic framework to a probabilistic one. EPS Training Edmonton

PCA Concerns • PCA explores the linear relationships in the data.. • Non-linear factors are not considered. • This shouldn't be a problem since we're running the algorithm on specific fields (i.e. We're looking at msl pressures, 500 mb heights, QPF's). • There might be a concern if we were comparing 500 mb heights and QPF's (and you can do that with PCA techniques) • Sometimes higher order components are difficult to interpret physically (how do you interpret a negative QPF, for example). • Since noise is shunted into the higher PC's, each successive component will be more and more noisy. EPS Training Edmonton

Varimax Rotation • One lingering problem is that it becomes increasingly difficult to put successive PC's into physical terms. How do you interpret a QPF value that might end up being negative after a coordinate rotation? • Our principle components do not exist in real space, but in component space and we need to describe what we see there in physical terms. • The solution is to perform yet one more coordinate rotation, this one intended to maximize the variance between each PC: a so-called Varimax Rotation • Developed by Kaiser in 1958 • The goal is to obtain a clear pattern of factor loadings characterized by high loadings of some factors and low loadings of others. EPS Training Edmonton

Unrotated and Rotated Factor Loadings • For the unrotated case, the factor loadings are all of approximately the same for the first PC. • For the second, you have a mixture of positive and negative values • After rotation, some factors are much closer to zero in one PC and they are maximized in the other and vice-versa. All are now positive. • Since the individual factor loadings are now different, so are the Eigenvalues. They are much closer together. • The rotated PCs may not be orthogonal anymore, so we can no longer say that they are uncorrelated, but at least we can interpret them. EPS Training Edmonton

Cool Facts About PCA Ensembles • The principle components map out the relevant ensemble pathways through model space. • If there is only one PC (i.e. one pathway), that PC is little different than the ensemble mean. • This is good behavior since we know that the ensemble mean does produce forecasts that are less wrong. We don’t want solutions that show that the ensemble mean has no merit. • Differences between the two are likely due to noise: the mean has it, the PC has it stripped out. • Situations where there are more than one PC have multiple pathways and the ensemble mean should not even be considered. • Careful here … too few ensembles may lead to a single PC when more ensembles may produce more PC’s. • The variance explained by each PC gives a measure of how “well-traveled” the pathway is. • The PCA analysis should tell the forecaster immediately when to and when not to use tools like the mean. EPS Training Edmonton

EPS Output

EPS Output

Presentation Transcript

Output-output correspondence

EPS version

EPS Forecasting

EPS Licencing

Basic EPS

EPS

EPS -12

Stonesoft EPS

Output-output correspondence

EPS 535

EPS 508

EPS