 
                
                E N D
2. MATERIALS AND METHODS The information used for the analysis was obtained in March 1997, at the CGSM (Fig. 1) . Water samples from the surface of the water column were analyzed for the following variables: temperature (Cº), salinity, total suspended solids (mg l-1), depth (m), silicates (mol l-1), chlorophyll “a” (g l-1), dissolved oxygen (mg l-1), nitrites (mol l-1) and chlorophyll “c” (g l-1). Between 103 and 114 observations were obtained for each variable. The data was taken throughout the system by systematic samples of squares of 4 km2. For each variable, the spatial auto-correlation structure was estimated by ML assuming Gaussian processes (dissolved oxygen and nitrites were log-transformed. Chlorophyll “c” was transformed with =0.35 using Box-Cox transformation) and correlation models of the Matérn Family (Diggle & Ribeiro, 2004). Sampling networks were simulated with distances of 2 (the observed), 3, 4, 5 and 6 km between points (Fig.1) and kriging prediction over 1000 unsampled points was carried out with each one. The corresponding mean prediction variances of each variable were estimated and related to the associated costs in each sampling density. The final decision on the proposed sampling network was based on practical criteria founded on the prediction variance-cost relationship. The analysis it was carried out using geoR package (Ribeiro & Diggle, 2001) Figure 2. Increase in precision (% reduction standard prediction error) each sampling network respect 6000 m network (least sampling points) Table 1. ML estimation of spatial correlation Matérn model. t f 2 Variable k Depth 0.50 0.00 0.13 1720 Temperature 0.70 0.14 9.03 14520 Salinity 0.60 0.00 21.8 10000 Log (Oxygen) 0.55 0.00 0.20 11390 Suspended Solids 0.21 0.01 2158 8296 Log (Nitrites) 0.50 0.18 0.71 12434 Silicates 0.50 1810 2089 7240 Chlorophyll “a” 0.44 0.00 1271 4842 Chlorophyll “c” ( = 0.35) 0.41 0.00 7.76 2000 Design of a Sampling Network for an Estuary in the Colombian Caribbean, Using Geostatistical Methods. Ramón Giraldo H1,2 1 Ph. D. Student. Statistical and Operational Research. Polytechnic University of Catalonia, Barcelona, Spain. E-mail. ramon.giraldo@upc.edu 2 Associated Professor. Statistics Department. National University of Colombia, Bogotá, Colombia. E-mail. rgiraldoh@unal.edu.co ABSTRACT. A network for monitoring physical chemistry and biological variables in the Ciénaga Grande de Santa Marta (CGSM) estuary, located in the Caribbean coast of Colombia, was designed. Initially a set of 114 sampling points was chosen to measure the considered variables (Fig. 1.a). Based on the data, a spatial auto-correlation structure for each variable was estimated, using the Matérn model and maximum likelihood. Some variables were assumed Gaussian. In other cases it was necessary to transform the variables in order to obtain Gaussian processes. Later, for different size networks, the kriging prediction variances were calculated, taking the adjusted autocorrelation models as a basis. The comparison among the prediction variances for the different networks and their associated costs allowed establishing a set of sampling sites, that at a reasonable cost, substantially diminishes the prediction error for the variables of interest. Key Words: Estuary, geostatistics, Gaussian processes, ML estimation, sampling networks. • INTRODUCTION • In environmental statistics, model based and design based approaches are used to solving the problem of estimating the size of the sample and the location of the sampling sites (Caselton & Zidek, 1984; Aldworth & Cressie, 1999; Groenigen, 2000; Caeiro et al., 2003). For spatial-mean predicition over the local region, ordinary kriging predictor (model based approach) is better than classical design-based estimators, when an appropiate model is choice (Aldworth & Cressie, 1999). In this situation, good designs tend to spread point uniformly in the region (Mc Bratney et al., 1981; Olea, 1984; Cox et al., 1997). • Accordingly, the problem of design sampling networks for local estimation is limited to establishing for sampling networks of different size, with a regular grid, the relationship between the maximum prediction variances and their associated costs. As the kriging variances are influences by spatial correlation, it is very important to have good estimates of the semivariogram parameters (Groenigen, 2000). When model based geostatistics (Diggle & Ribeiro, 2000) is assumed, maximum likelihood (ML) would be preferred in order to estimate the parameters of the spatial correlation model instead of ordinary least squares (Stein, 1999) Figure 3. Sampling cost for each variable on five sampling network With variables dissolved oxygen, silicates and chlorophyll “a” the decision is complex given that there are considerable increases in the costs (Fig. 3) and the efficiences with increasing samples (Fig. 2). A global analysis of the increases in cost and in efficiency show that the 2 km network is the least recommendable given that, compared to the 3 km one, there is a high increase in costs (more that 200%) but the relative efficiency increases in only 4.9%. While in relative terms, the change in efficiency and the costs going from one network to another with a greater number of points is similar (with the exception of the 2 km one), the networks with distances between sampling points of 4 km and 5 km should be considered to be the most advisable, given that they produce a greater efficiencythat the one obtainedin the 6 km, with slightly higher costs. The suggestion given in the foregoing paragraph about the optimum sampling arrangement to monitoring the variables considered in the ecosystem, are not absolute. In the final analysis, while comparing the functions of cost and of statistical efficiency, many purely empirical criteria have been used. Nevertheless it is considered that the agencies that make the final decision should have a tool that allows them to plan the most adequate monitoring strategy for the future. Figure 1. The sampling networks under which the estimation of the prediction variances were made for each variable The distances between the sampling points: a) 2 km; b) 3 km; c) 4 km; d) 5 km and e) 6 km Salinity is the variable which the greatest gain in precision was obtained (35%) when changing to the less dense network to the densest (Fig. 2) . Other variables such as temperature, dissolved oxygen, silicates, and chlorophyll “a” had increases in precision that varied between 15.9% and 23.8% (Fig. 2). Finally, for depth, nitrites, total suspended solids and chlorophyll “c”, the increase in precision was only in percentages between 5.7% and 10.1% (Fig. 2). Obviously, when comparing the intermediate networks, those with grid distances between 3, 4, and 5 km., with the 6 km network, the relative increase in precision was much less. • REFERENCES • Aldworth, J. & N. Cressie. 1999. Sampling designs and prediction methods for gaussian spatial processes. In "Multivariate analysis, design of experiments and survey sampling" (S. Ghosh, ed.), pp. 1-54. Marcel Dekker Inc, New York , USA • Caeiro, S., Painho, M., Goovaerts, P., Costa, H. and S. Sousa. 2003. Spatial sampling design for sediment quality assessment in estuaries. Environ. Mod. & Soft., 18: 853-859 • Caselton, W. & J. Zidek. 1984. Optimal monitoring networks designs. Statistics & Probability Letters, 2: 223-227 • Cox, D., Cox, L & K. Ensor. 1997. Spatial sampling and the environment: some issues and directions. Environ. Ecol. Stat., 4:219-233 • Diggle, P. & P. Ribeiro. 2000. Model based geostatistics. 14 Sinape, Brasil. • Groenigen, J. 2000. The influence of variogram parameters on optimal sampling schemes for mapping by kriging. Geoderma, 97:223-236. • McBratney, A., Webster, R. & T. Burgess. 1981. The design of optimal sampling schemes for local estimation and mapping of regionalized variables I. Computers and Geosciences, 7(4): 331-334 • Olea, R. 1984. Sampling design optimization for spatial functions. Math. Geol., 16:369-392 • Ribeiro, P & P. Diggle. 2001. geoR. Package for geostatistical data analysis. R- NEWS, Vol 1, No 2, 15-18. • Stein, M. 1999. Interpolation of spatial data. Some theory for kriging. Springer 3. RESULTS. The adjusted Matern models (table 1) show strong spatial dependence for some variables (temperature, nitrites, salinity) in the area. The ranges are relatively high because the distance between the extreme north and south of the system (the longest distance) is not more than 20 km. The nugget was not greater than 50% of the sill. This, is recommendable for the spatial correlation model describe adequately reality (Caeiro et al.,2003) The sampling costs associated each variable under each sampling density were different (Fig.3). Temperature, depth, and salinity had cost was much lower than other variables. For some of the variables (dissolved oxygen, silicates, and chlorophyll) going from a 3000 m network to a 2000 m the sampling, increased cost about US $240. Hence, for temperature and salinity, it would be more convenient to make an intense sampling (the densest network) as this would increase the efficiency in a considerable percentage (Fig. 2), with costs increased only about US $90 (Fig 3). For depth, even if the sampling costs are not significantly increased (Fig.3), is more recommendable to sample itin the less dense network, given that the efficiency is increased by a maximum of 7% in comparison with other networks (Fig. 2). For nitrite, total suspended solids and chlorophyll “c”, there is only a little increase in the efficiency with increasing network density; on the contrary, the costs, especially in the 2000 m network, increase considerably (Fig. 3).Hence the less dense networks (5000 m and 6000 m between sampling points) are the most adequate for the follow-up of these variables. I