July, 2001

July, 2001

Outline • Purpose of GeoSEM • Background • Methods • Examples • Project Status • Future Developments July 2001

Assumenormal distribution • Assumelog-normal distribution (Land) Background • Currently, the EPC is typically estimated using one of two formulas1: 1Details are provided on pages 138 and 169 in Gilbert 1987, Statistical Methods for Environmental Pollution Monitoring. July 2001

Purpose of GeoSEM • Provide a user-friendly tool to incorporate spatial statistics in risk assessment. • Three contributions to risk assessment: • Consider sample locations in the estimated EPC (reduce bias) • Point estimate of the mean; UCL on the mean • Maximize information from limited sample sizes (increase precision) • When sampling scale > EU scale • Quantify and display variability and uncertainty in the spatial distribution of risk (specify PDFs, CIs) • Point and ‘block’ (i.e., area) estimates July 2001

Estimating the EPC– 3 methods • Spatial Weighting of Sample Data • Small sample sizes OK • Thiessen polygons • Weights are proportional to area of polygon • Kriging • Kriging over an irregular block (exposure unit)* • Minimum sample size required • Simulation (soon to be added to GeoSEM)* • Less sensitive to assumptions of normality and constant variance • Estimates of uncertainty are conditional to the samples used * Unique to GeoSEM July 2001

EU boundaries perpendicular bisectors Spatially weighting data # S # S # S # S # Thiessen polygons assign a representative area to each sample S Thiessen polygons is one of several methods available to calculate spatial weights for sample data. We elected to use Thiessen polygons method in GeoSEM because it is a commonly used, straightforward method for accounting for spatial clustering of samples. Note that spatial clustering may arise from the use of random as well as non-random sampling methods. Small polygons are indicative of samples that are located close to other samples (i.e., spatial clustering), and therefore receive smaller weights than samples which are spaced further apart. # S Polygons are formed by connecting perpendicular bisectors drawn between each pair of adjacent sampling locations. # S # S # S spatial weight for sample i = (area of polygoni / total area of EU) * n EU = exposure unit n = number of samples # S # S # S S # # S # S # S July 2001

Kriging (ordinary kriging, ‘OK’) • Many types of kriging can be performed with GeoSEM. • At this time, OK is recommended for estimating the EPC.(Research is planned to explore the accuracy of EPCs estimated with selected kriging and geostatistical simulation algorithms.) • On the following slides, OK1 is used to describe the steps involved in calculating the kriging estimate of a concentration (or any other attribute) at an unsampled location (i.e., point kriging). The estimation of the mean concentration within a square area (i.e., block kriging) is then described. Finally, the algorithm GeoSEM uses2 to extend the block kriging algorithm to estimate the EPC for irregular areas is explained. 1other forms of kriging consist of modifications of the OK algorithm and thus share many of the same features with OK. 2At this time, GeoSEM calls the gstat freeware executable to perform all geostatistical calculations. July 2001

First Step: Estimating the Variogram • The variogram is used to model the spatial dependency (i.e., autocorrelation) present in the data. Spatial autocorrelation is expressed as the tendency for samples located close together to have similar concentration levels. • The x-axis shows the distance between sample locations • The y-axis shows the average variance between samples located ‘x-units’ apart. The spherical semivariogram model shown above has the following parameters: sill (sample variance) = 1600 units nugget (variance at 0 distance) = 0 range (distance at which variance = sill) = 1980 units July 2001

Where = the Lagrangian parameter, which is required to satisfy the unbiasedness constraint. = point or location where estimate is desired = sample locations d1 = distance between sample and point to be estimated OK – Point Kriging Kriging Estimate, : = weighted average of measured concentrations, ; n = number of samples located within search radius The weights are determined by minimizing the error variance, under the constraint that the weights sum to 1 S2 d2 d3 d1 0 S3 S1 d4 d5 S4 S5 , which ensures the point estimate is unbiased, search radius using the following equation: The terms Cij and Ci0 represent the covariance between sample pairs i and j and sample i and the point to be estimated (0), respectively; the covariances are read from the semivariogram, based on the distances, d1-d5. Kriging variance ( ): July 2001

S2 S3 S1 S4 S5 Block Kriging variance ( ): , where: = block discretizing point = average covariance between discretizing points = sample locations A = area of block OK – Block Kriging (square block) The equation used to calculate the block kriging estimate is very similar to the point kriging equation with one exception: the Ci0 term (see previous slide) is replaced by the CiA term which represents the average covariance between sample i and the points used to discretize the block. , where: Red lines: sample point to block covariance Green lines: within block covariance July 2001

Block Kriging: irregular area Exposure Unit (EU) GeoSEM1 is able to calculate a block kriging estimate for an area (i.e., EU) of any shape or size. The area is discretized by a grid of points as shown to the left. The grid spacing is determined by the user. GeoSEM will return a mean (i.e., EPC) and kriging variance that can be used to assess the uncertainty in the mean (e.g., to calculate the 95th UCL on the mean). The equations for the block kriging mean and variance are the same as those shown on the previous slide. The EPC for an irregular shape such as the one shown here could be estimated by computing block kriging estimates for each of a series of square blocks, as indicated in the figure to the left. The EPC could then be estimated by averaging the block means for each of the square blocks. However, averaging the block kriging variances for each of the square blocks to estimate the kriging variance for the EU is not valid. The ability to compute a kriging variance for an irregular shaped EU, and therefore to assess uncertainty in the EPC, is unique to GeoSEM. This ability will soon be extended by offering the user the option to perform geostatistical simulation to asses uncertainty in the EPC, as well to describe the uncertainty in the spatial distribution of contaminants across a site or EU. July 2001 1all geostatistical analyses are performed by gstat

Where, = kriged point or block estimate = kriging variance Kriging estimate of a UCL • An upper confidence level on a point kriging estimate and block kriging estimate (EPC) can be estimated using the same equation. • The equation assumes the distribution for uncertainty can be modeled by a normal distribution. For example: July 2001

Simulation (SGS) • At this time GeoSEM uses the gstat sequential Gaussian simulation (SGS) algorithm; additional simulation methods will be supported in the future. • The SGS algorithm assumes the sample data are derived from a population that can be modeled by a multivariate normal distribution. However, preliminary research has indicated the SGS method is robust to departures from normality. • Similar to kriging, GeoSEM can perform point or block simulation. The block simulation method is used for estimating EPCs; point simulation is used to describe the spatial distribution of a pollutant across the EU or site. Both types of simulation are described on the following slides. July 2001

= sample locations = point or location where a concentration has been simulated data used (some arrows are not shown) = point or location where a concentration has not yet been simulated Point Simulation – spatial distributions The SGS algorithm is an extension of the kriging algorithm. The simulation algorithm follows a random path through the grid, eventually producing an estimate at each grid point. In the first step of the estimation procedure, the kriging estimate for a given point is calculated according to the method described on the previous slides, with one important distinction: the kriging estimate is a weighted average of all sample locations and previously simulated estimates that are located within the search radius chosen by the user. In the second step, the kriging estimate and variance for the grid point are used to define a normal probability distribution function (pdf) representing uncertainty in the concentration at the grid point. A value is then randomly selected from the pdf and assigned to the grid point. The two-step process is then repeated at the next randomly-selected grid point, until all points have been estimated. Completion of the simulation process at all grid nodes constitutes one simulation, or ‘realization’. The number of simulations (‘R’) is determined by the user. Point being estimated search radius Uncertainty in the spatial distribution of a pollutant is modeled from the R simulated concentrations at each of the grid points. For example, if 100 simulations were performed, a map of the 95th % from the distribution of uncertainty could be prepared by ranking the simulated concentrations at each node and selecting the value corresponding to the 95th simulation. July 2001

= block discretizing point Block Simulation – estimating EPC Exposure Unit (EU) Block simulation is accomplished using an algorithm that is a mixture of the block kriging and point simulation algorithms that were described previously. In the first step of the simulation, kriging is used to estimate a block kriging mean and kriging variance for the EU. The block kriging mean and variance define a normal probability distribution that is used to represent uncertainty in the kriged mean. In the second step of the simulation, the estimate of the mean for the EU is produced by randomly selecting a value from the distribution of uncertainty. The two-step process is repeated for each EU. Once an EPC for each EU has been estimated, the simulation is complete. The entire process is repeated for the desired number of simulations (R), which produces R estimates of the mean for each EU. ‘R’realizations Uncertainty in the EPC is modeled from the R simulated means for each EU. For example, if the user wished to use the 90th UCL on the mean as the EPC, and assuming 1000 simulations were performed, the EPC could be estimated by the ranking the simulated means for each EU and selecting the 900th largest mean for each EU. July 2001

Purpose 1: unbiased estimates(?) • The next slide contains a map of a Superfund site that shows the spatial distribution of surface soil samples collected from the site. The site boundary shown was assumed for the purposes of illustration. • The two slides after the site map contain tables comparing the mean and 95th UCL on the mean calculated using the current methodology (i.e., non-spatial methods) and the two of the spatial approaches currently available in GeoSEM: spatial weighting and OK. • The tables illustrate the importance of considering the spatial distribution of samples at the site when estimating the EPC. The tables do not offer proof that the spatial methods totally remove the bias present in the non-spatial estimates; further research is required in this area. July 2001

Site 1 sample distribution July 2001

Comparison for Arithmetic Mean Blue = exceeds 1x10-6 cancer risk sRed = exceeds 1x10-5 cancer risk SpMean = spatially-weighted mean Exposure scenario: wildlife worker (arsenic RBC from Bunker Hill) Expedited Screening Level Assessment, URS Greiner/CH2MHill/SRC, 1998)

Comparison for 95th UCL Blue = exceeds 1x10-6 cancer risk sRed = exceeds 1x10-5 cancer risk Exposure scenario: wildlife worker (arsenic from RBC Bunker Hill) Expedited Screening Level Assessment, URS Greiner/CH2MHill/SRC, 1998)

Purpose 2: maximize information • The next four slides address the use of kriging to improve the accuracy of estimates of the EPC. • There are 236 samples available for the 50-acre site. This would generally be considered more than an adequate number of data to estimate the EPC for the site; i.e, if the EU = site. • However, if the current or future use of the site was residential, the EU is defined as a residential yard. Despite the relatively high number of samples for the site, we go from a ‘data rich’ (n=236) to a ‘data poor’ situation (0 <= n <= 8) if an estimate of the EPC for each yard is required. • The next slide describes some of the advantages of kriging and simulation in this situation. We do not offer any proof at this time that the spatial methods are more accurate than the non-spatial methods. At the very least however, kriging and simulation allow one to estimate the concentration for an EU without samples located within it. This could be very useful for designing future sampling/remediation efforts, as discussed on the following slides. July 2001

Max. info from small n- methods • Kriging • Considers the spatial continuity (spatial autocorrelation) in the data • Samples located outside the EU are considered • Application to any geographic shape (EU or site) • Simulation • Robust to departures from normality and constant variance • Estimates of uncertainty are conditional to the samples used • Simulation over any geographic shape (EU or site) July 2001

Site 2sample locations July 2001

Max info from small n • The following slide focuses on the eastern portion of the site (which has been rotated 90o clockwise). • The polygons were arbitrarily drawn and are intended to represent residential yards (i.e., EUs), that range in size from approximately 0.5-2 acres in size. July 2001

Estimate EPC spatially-weighted mean The spatially-weighted estimate and the kriging estimate indicate the EPC is higher than the simple mean. The samples are shifted to the eastern portion of the EU, towards the lower contamination levels (as indicated by the samples in the surrounding EUs). The spatially-weighted method gives a higher weight to the sample located in the middle of the EU to account for the sample configuration. Kriging uses the data in the surrounding EUs to estimate the EPC for this EU. Simple mean kriged mean In this case, kriging is able to produce an estimate where the other two methods can not, due to lack of samples located in this EU. The low estimate for the EPC (16.2 ppm) indicates additional sampling may not be necessary for this EU. Again, kriging is able to produce an estimate where the other two methods can not, due to a lack of samples located in the EU. The high estimate for the EPC (2165 ppm) indicates additional sampling may not be necessary to determine if this EU requires remediation. Geostatistics can be used however to determine which portions of the EU should be remediated (see the ‘CDFs for Spatial Uncertainty’ slide). July 2001

Uncertainty - Pr(EPC > RBC) ? Issue 2: Constant variance assumption: The second issue this slide addresses is the constant variance assumption of kriging. Use of the kriging variance to calculate UCLs on the mean (or the probability of exceeding an RBC) requires an assumption that the kriging variance is constant across the site; i.e., the data exhibit the same amount of variability in all regions of the site. This assumption is often a poor one for contaminated sites; data from areas of high contamination will exhibit high variability while data from areas of low contamination will tend to be less variable. In fact, the kriging variance is determined by the geometrical arrangement of the data (i.e., sample locations) that is used to estimate the mean; the actual concentrations are not considered directly by the kriging variance. Simulation Kriging Issue 1: Maximum info from small n: The EUs shown in white (‘-999’) in the log-normal map indicate the probability of exceeding the RBC could not be estimated due to insufficient sample size, or in some cases, the lack of variation in the data (i.e., all the samples are non-detects). Note: the inability to estimate the probability of exceeding the RBC is equivalent to an inability to estimate the 95th UCL (or any other UCL) on the mean. An effect of violating the constant variance assumption is illustrated in the kriging map. The EUs delineated with the blue line are in an area of the site with low contamination; in fact, many of the samples are non-detects. The kriging estimate however indicates a high probability that the EPC > RBC for all of these EUs. Another way of looking at this: if the 95th UCL was used as the EPC for these EUs, the EPC would be greater than the RBC, despite the consistently low concentrations measured in this area of the site. This slide shows 3 maps of the probability of the EPC exceeding a risk-based concentration (RBC) of 1100 ppm. This slide addresses two issues: 1) maximizing information available from small samples ; 2) the constant variance assumption of kriging. In contrast to the kriging map, the simulation map looks much more consistent with the measured concentrations. Simulation indicates it is very unlikely that the EPC exceeds the RBC of 1,100 ppm for the EUs delineated by the blue line. Despite the low number of samples (in some cases 0 samples) located within these EUs, geostatistical simulation, by considering the spatial pattern of contamination, indicates it is not worthwhile to collect additional data in this area. Log-normal July 2001

Purpose 3: spatial distributions • Kriging • Spatial Variability: MVUE of concentrations (regardless of underlying distribution); maps are“smoothed” • Spatial Uncertainty: assumes normality and constant variance • Simulation • Spatial Variability: approximatelyreproduces the spatial structure (autocorrelation) and variability of the sample data • Spatial Uncertainty: • Robust to departures from normality and constant variance • Estimates of uncertainty are conditional to the data July 2001

Spatial Distributions • The following slides show some examples of the use of geostatistics to produce maps and probability distributions that describe the spatial variability and uncertainty in surface soil • This information can be used to prepare future sampling plans and in remediation planning and design. July 2001

Spatial Variability • Ordinary Kriging • accurate (MVUE) • smoothing • Sequential Gaussian Simulation • less smoothing: better reproduction of spatial structure July 2001

CDF for Spatial Variability Ordinary Kriging July 2001

Spatial Uncertainty – 95th UCLs • Ordinary Kriging • constant variance • assumption • variance = • f(sample geometry) • Sequential Gaussian Simulation • variance isconditional • to neighborhood July 2001

EU 88 CDFs for Spatial Uncertainty CDFs for 2 remedial units located within EU 88 This information can be used in remedial action planning and design: which remedial units should be cleaned up to achieve the remedial action objectives? - the RUs with the greatest probability of exceeding the PRG. July 2001

Site 1 sample locations July 2001

Ra-226 (pCi/g) Spatial Variability July 2001

Pb-210 (pCi/g) Spatial Variability July 2001

Another way to view uncertainty • Convert uncertainty in media concentration to uncertainty in risk Pr(estimated conc > PRG) Pr(RME risk > risk level of concern) • Point estimate (95th UCL) or probabilistic (PDFu) results • Map with color shading to highlight areas of concern (concentrations or risks) July 2001

Pb-210 Cancer Risk July 2001

Ra-226 Cancer Risk July 2001

SGS – Constant Variance untransformed data normal score- transformed data July 2001

Uncertainty in Risk - As OK SGS July 2001

Uncertainty in Risk – Ra-226 July 2001

Project Status- Software capabilities • Estimate the 95% UCL or P(mean > PRG) • Normal PDFv (t-statistic), Lognormal PDFv (h-statistic) • Spatial methods: spatially-weighted data and kriging • Exposure Unit • Define the size and shape with the mouse • Legend Tool • Display results by percentiles or assigned intervals • Add labels, graphics and text to maps July 2001

Project Status- Software capabilities • Import Data • Access, Excel, DBF, Arcview, ArcInfo, ArcSDE, DOD • Imports Maps and Images • AutoCAD, Geo-TIFF (e.g., USGS/DOT quads), GIF, JPEG • Export Results • DBF, Arcview, other ESRI copy/paste maps and statistics to clipboard July 2001

Future Enhancements • Spatial Statistics • Geostatistical simulation module is being tested; soon to be completed • Thiessen polygons • Data Analysis Features • PDFu and CDFu • Identify potential outliers (‘masking’) • Probability Plots • Output Features • Improve legend tools for labeling and rendering • Create contour maps for sample data, and kriged & simulated data • Create interface with ISE model, Variowin freeware July 2001

July, 2001

July, 2001

Presentation Transcript

CSUN PHYSICS WORKSHOP SUMMER 2001 July 9 - July 20

July 2001

MANPRINT Quarterly July 2001

July 19, 2001

Friday, 13 th July 2001

STARTING POINT Budapest july 2001

Interim Results - 26 July 2001

Process Flow July 10, 2001

SCIGN unveiling event July 6, 2001

July 16, 2001

Performance Institute - July 20, 2001

COR1 Status – July 2001

July 2001

Medical Disaster Conference July 2001

3mm Testing in July 2001

IEEE Communications Magazine July 2001

July 2001

July 2001

Friday, 13 th July 2001

July 18, 2001

July 19, 2001

DataGrid WP6 Security 4 July 2001