600 likes | 724 Vues
WE PROBABLY COULD HAVE MORE FUN TALKING ABOUT THESE TRAFFIC STOPPERS. WHO CLEARLY HAVE THE RIGHT OF WAY!. BUT…. DESIGNING MONITORING SURVEYS OVER TIME (PANEL SURVEYS) POWER, VARIANCE and RELATED TOPICS. N. Scott Urquhart Senior Research Scientist Department of Statistics
E N D
WE PROBABLY COULD HAVE MORE FUN TALKING ABOUT THESE TRAFFIC STOPPERS
DESIGNING MONITORING SURVEYS OVER TIME (PANELSURVEYS)POWER, VARIANCE and RELATED TOPICS N. Scott Urquhart Senior Research Scientist Department of Statistics Colorado State University Fort Collins, CO 80527-1877
OUTLINE • Anatomy Of Sampling Studies Of Ecological Responses Through Time • Collaborator = Tony Olsen, EPA, WED • http://www.oregonstate.edu/instruct/st571/urquhart/anatomy/index.htm • Urquhart, N.S. (1981). Anatomy of a study. HortScience16:621-627. • Elaboration on • Survey Designs – GRTS – Work of Don Stevens • Temporal Designs • Power to detect trend – joint with Tom Kincade • Uses components of variance • Current work = estimating variance • Work of Sarah Williams, finishing MS this month
A CONTEXT • “EMAP-TYPE SITUATIONS”EMAP = US EPA’S Environmental Monitoring and Assessment Program • Estimate status, changes, and trends in selected indicators of our nation’s ecological resources on a regional scale with known confidence. • Estimate status, changes, and trends in the extent and geographic coverage of our nation’s ecological resources on a regional scale with known confidence. • Describe associations between indicators of anthropogenic stress and indicators of condition.
WHO MUST COMMUNICATE • Ecologists & Other Biologists • Statisticians • Geographers • Geographic Information Specialists • Information Managers • Quality Assurance Personnel • Managers, at Various Levels
“SAMPLING” • A WORD OF MANY MEANINGS • A statistician often associates it with survey sampling • An ecologist may associate it with the selection of local sites or material • A laboratory scientist may associate it with the selection of material to be analyzed from the material supplied • Common general meaning, varied specific meanings
THE SPECIAL NEED • Communication Demands a Distinction Between • The local process of evaluating a response, and • The statistical selection of a sampling unit, for example, • A lake • A point on a steam • A point in vegetation • The terms • Response design • Sampling design or survey design • Can be used to make this distinction
BASIC ROLES • Survey Design Tells Us Where To Go to Collect Sample Information or Material • Response Design Tells Us What To Do Once We Get There • But These Two Components Exist in a Broader Context
AN IMPORTANT DISTINCTION • Monitoring Strategy • Conceptual • Impacted by objectives • Addressable without regard to the inference strategy • Inference Strategy • Places to evaluate the response • Relation between points evaluated and the population • Ie, the basis for inference
These components exist regardless of the inference strategy These components exist for any monitoring strategy SAMPLING STUDIES OF ECOLOGICAL RESPONSES THROUGH TIME HAVE • Monitoring Strategy • Universe model • Statistical population • Domain design • Response design • Inference Strategy • Survey design • Temporal design • Quality assurance design
The UNIVERSE MODEL • Reality (Universe): Ecological Entity Within a Defined Geographic Area to be Monitored • Model of the Universe: • Development of monitoring approach requires construction of a model for the universe • Elements Of The Universe Model: Set of Entities Composing the Entire Universe of Concern
The UNIVERSE MODEL • Population Description And Its Sampling Require Definition Of the “Units” in the Population • Discrete units: • Lakes may be viewed this way • Individual trees can be viewed this way, too • Continuous structure in space of some dimension: • 2-SPACE: Forests or Agroecosystems • 1-SPACE: Streams • 3-SPACE: Groundwater
Second Order First Orders Second Order First Orders Second Order First Orders First Order A CONTINUOUS MODEL FOR STREAMSStrahler Orders Third Order
The STATISTICAL POPULATION • The Collection of Units (as modeled) Over Some Region of Definition • Spatial • Temporal • Spatial and Temporal • Population Definition Could Include Features Which Depend on Response Values • EX: acid sensitive streams at upper elevations
The DOMAIN Design • Specifies Subpopulations or “Domains” of Special Interest • May Specify Meaningful Comparisons Between Domains • Similar to “planned comparisons” in experimental design situations • Domain design may depend in response values • EX: Warm Versus Cold Water Lakes
The RESPONSE DESIGN • The Response Design Specifies • The process of obtaining a response • At an individual element (site) • Of the resource • During a single monitoring period • Response: What Will Be Determined on an Element • Needs to be responsive to the objectives of the monitoring activity
The INFERENCE STRATEGY • Is The Basis For Scientific Inference • Provides The Connection Between Objectives and the Monitoring Strategy • Monitoring Strategy Usually Must Rely On Obtaining Information on a Subset Of All Possible Elements in the Universe • Specifies Which Elements of the Universe Will Have Responses Determined on Them • Can Be Based on Either • Judgment selection of units • Inferential validity rests on knowledge of relation between the universe and the units evaluated • Why do a study if you know this much about the population? • Probability selection of units • The focus here
The SURVEY Design • Probability Based Survey Designs are Considered HereMay Be Somewhat Limited To Sedentary Resources • Positive Features -- As An Observational Study • Permit clear statistical inference to well-defined populations • Measurements often can be made in natural settings, giving to greater realism to results
The SURVEY DESIGN - CONTINUED • Disadvantages • Limited control over predictor variables • Restricts causative inference • Usually will produce inaccessible sampling points • Good - for inference • Bad - for logistics
The TEMPORAL Design • The TEMPORAL DESIGN specifies the pattern of revisits to sites selected by the Survey Design • Sampled population units are partitioned into one (degenerate case) or more PANELS. • Each population unit in the same panel has the same temporal pattern of revisits. • Panel definition could be probabilistic or systematic • Several temporal designs follow after a brief discussion of the rest of the Anatomy, and a bit on site selection.
QUALITY ASSURANCE DESIGN • Defines Those Activities Intended to Provide Data of Known Quality: • Blind duplicates • Accepted chemical standards, etc • Can Provide Valid Estimates of the Variance Of Pure Measurement Error
ON SITE SELECTION • Systematically Selected Sites • Good for means & totals, but do not support design-based estimate of variance • Probably OK for large areas like national forests, • Systematic designs can systematically miss things that have a natural layout. • EX: Triangular grid (deliberately skewed) in early EMAP got fowled up with • Coastline in the Northeast • The canal network in Florida • Lakes east of the Cascade Mountain Range in Oregon • How to select spatially balanced, but random sites?
GENERALIZED RANDOM TESSELLATION STRATIFIED (GRTS) DESIGN • Due to Don Stevens – see references • Allows • A continuous population model • Variable density sampling by defined areas • Accommodates an “imperfect frame” = reality • Sequential addition of points while maintaining spatial balance • Differing measurements • Lots of points for inexpensive measures • A subset for more expensive measures • A further subset for very expensive measures • Implemented in Southern California Bight
GENERALIZED RANDOM TESSELLATION STRATIFIED (GRTS) DESIGN • Two GIS-based implementations • EMAP R code operates on ARC “Shape” files, and returns points there • Begin at http://www.epa.gov/nheerl/arm/ • http://www.epa.gov/nheerl/arm/designpages/monitdesign/monitoring_design_info.htm • http://www.epa.gov/nheerl/arm/documents/design_doc/psurvey.design_2.2.1.zip • STARMAP – Dave Theobald • RRQRR operates completely in ArcGIS • http://www.nrel.colostate.edu/projects/starmap/rrqrr_index.htm • Both Allow Variable (spatial) Sampling Rates • Generally much better than stratification • (We can talk about this more if you want)
THE FOLLOWING MATERIAL WAS ADAPTED FROM Urquhart, N.S. and T.M Kincaid (1999). Designs for detecting trend from repeated surveys of ecological resources. Journal of Agricultural, Biological and Environmental Statistics4: 404 - 414. Initially presented at the invited conference Environmental Monitoring Surveys Over Time, held at the University the Washington, Seattle, in 1998
MOTIVATING SITUATION • In 1986 Oregon Department of Fisheries and Wildlife Sought a “One Time” Probability Sampling Design To Survey Coastal Salmon. They Used It In 1990. • It showed earlier estimates of salmon returns to spawn to have been grossly overstated. • Consequence: continue to repeat an available design. • How Good Is The Repeated Use Of Such a Design For Estimating Trend?
CONCLUSIONS • General: Power for Trend Detection • Planned revisits are far superior to obtaining revisits from random “hits” • Year Variance: Power Deteriorates Fast as Increases • Site Variance: • No problem with revisit designs. • Without revisits it increases residual variance. • Sampling Rate: Power Increases with Sampling Rate (No surprise!)
EVALUATION CONTEXT • General Perspective • Finite population sampling • But model assisted • A generalization of the “error analysis” perspective of samplers • But recognizing realities of natural resource sampling • Specific Perspective • Finite population, like of stream segments. • Response exists continuously in time, or at least for reoccurring blocks of time. • Take independent samples at different points in time (during an “index window”)
EVALUATION CONTEXT(CONTINUED) • Model: • Sites (or stream segments) = a random effect • Years = a random effect, but may contain trend • Residual = a random effect • Specific evaluation time • Variation introduced by collection protocol • Crew effect, if present • (often present for large surveys) • “Measurement error” - broadly interpreted
PANEL PLANS = “TEMPORAL DESIGNS” • Sampled Population Units are Partitioned into One (Degenerate Case) or More Panels • Each population unit in the same panel has the same temporal pattern of revisits. • Panel definition could be probabilistic or systematic • Specific Plans • Always revisit • Never revisit repeated surveys • Random revisits and other plans
TEMPORAL DESIGN #1:ALWAYS REVISIT = ONE PANEL(This is Wayne Fuller’s “PURE PANEL”)
TEMPORAL DESIGN #2:NEVER REVISIT = NEW PANEL EACH YEAR(INDEPENDENT SURVEYS IN A LARGE POPULATION)
TEMPORAL DESIGN #3:ROTATING PANEL • A Rotating Panel Design Is The Temporal Design Used By The National Agricultural Statistical Service (US - “NASS”) • This Temporal Design Is “Connected” In The Experimental Design Sense • It is fairly well suited for estimation “status,” • But not nearly particularly powerful for detecting trend over intermediate time spans
TEMPORAL DESIGN:SERIALLY ALTERNATING(ORIGINAL EMAP) • This Temporal Design Is “Unconnected” in the Experimental Design Sense.
TEMPORAL DESIGN #5:AUGMENTED SERIALLY ALTERNATING(CURRENTLY USED BY EMAP FOR SURFACE WATERS) • This Temporal Design Is “Connected” in the Experimental Design Sense.
STATISTICAL MODEL • Consider A Finite Population Of Sites • {S1 , S2 , … , SN } • and a Time Series Of Response Values At Each Site: • A finite population of time series • Time is continuous, but suppose • Only a sample can be observed in any year, and • Only during an index window of, say, 10% of a year
STATISTICAL MODEL -- IV • If P Indexes Panels, Then • Sites are nested in panels: p( i ) and • Years of visit are indicated by panel with npj > 0 or npj = 0 for panels visited or not visited in year j • The vector of cell means ( of “visited” cells) has a covariance matrix S :
STATISTICAL MODEL -- V • Now Let X Denote a Regressor Matrix Containing a Column Of 1’s and a Column of the Numbers of the Time Periods Corresponding to the Filled Cells. The Second Elements of Contain an Estimate Of Trend and its Standard Error.
TOWARD POWER • Ability of a Panel Plan to Detect Trend Can Be Expressed As Power. • We Will Evaluate Power in Terms of Ratios of Variance Components: • and of
A SIMULATION STUDY TO MAKE POWER COMPARISONS • n = 60 • N = 60, 240, 600, 1200, 10,000 • ==> Sampling rates of 100%, 25%, 10%, 5%, ~ 0%
0.000, 0.075, 0.15, 0.30 ALWAYS REVISIT, or EMAP-LIKE
ALWAYS REVISIT, or EMAP-LIKE NEVER REVISIT