Some Insights into Data Weighting in Integrated Stock Assessments

Some Insights into Data Weighting in Integrated Stock Assessments André E. Punt 21 October 2015 Index-1 length-4

Background • “Integrated” models potentially • involve numerous data sources: • Indices of abundance (CPUE, surveys) • Length-composition data • Age-composition data • Discards • Mean body weight • Conditional age-at-length data • Moreover, each data source may be • available for more than one “fleet” Johnson et al. PFMC Sablefish Assessment

Objectives • Outline alternative methods of weighting for: • Length- and age-composition data • Conditional age-at-length data • Evaluate the performance of these methods given model mis- • specification.

Methods for tuning length (and age) composition data-I Let be the observed proportion of animals in length-class L during year y, and be the model-predicted proportion of animals in length-class L during year y. Under the assumption that length samples are multinomial (as is the case in Synthesis, ASAP, etc), the weight assigned to the data is the “effective sample size”, : where the are the input effective sample sizes.

Methods for tuning length (and age) composition data-II McAllister-Ianelli: This method sets the effective sample size by comparing the residual variance with the variance expected under a multinomial distribution: To compute an overall effective sample size, , it is necessary to average over the . Two options are commonly used: McAllister-Ianelli-1: McAllister-Ianelli-2:

Methods for tuning length (and age) composition data-III But residuals for length-compositions are seldom uncorrelated between length-classes – enter “Francis weighting”. The idea behind Francis weighting is to base on the mean age or length, i.e.: where is the mid-point of length-class L.

Methods for tuning conditional age-at-length-I Conditional age-at-length (CAL) data are (essentially) age-length keys. These data provide information on year-class strength and growth. CAL data are matrices by year, which makes application of standard weighting schemes difficult.

Methods for tuning conditional age-at-length-II Let be the observed proportion of animals in length-class L during year y that are of age a, and be the model-predicted proportion of animals in length-class L during year y that are of age a. Under the assumption that age samples are multinomial conditional on length, the negative log-likelihood is: where the are the input effective sample sizes.

Methods for tuning conditional age-at-length-III The McAllister-Ianelli and Francis methods can be extended (naively) to handle conditional age-at-length data: McAllister-Ianelli: McAllister-Ianelli-1: McAllister-Ianelli-2: Francis-A:

Methods for tuning conditional age-at-length-IV The Francis-A can be criticised because it treats each row of an age-length key as being independent. This is unlikely to be true. The Francis weighting method for length (and age) data can be generalized to age-length keys (Francis-B) by applying the basic algorithm to the mean age of the age-length key, i.e.: where: is the fraction of animals during year y observed to be in length-class L.

Simulation StudY EVALUATION consciouslyenlightened.com www.scottmcd.net

Simulation Details-I • Spatial structure: • One zone OR • Two zones with spatial variation in F • Fleet structure: • Non-trawl fleet • Trawl fleet • Data (by fleet and zone): • CPUE series (all years; CV = 0.1) • Length frequencies (all years; = 100) • Age-at-length data (50% of year-fleet-zone combinations; = 500) • Logistics: • 100 simulations • Single-area estimation method • Performance measure: spawning biomass (summed over zones).

Tuning algorithms Each tuning algorithm (except Francis / Francis-A*) is applied five times • McAllister-Ianelli-1: Tune the residual variance for the CPUE data and use the McAllister-Ianelli-1 method for both length and CAL data. • McAllister-Ianelli-2: As for McAllister-Ianelli-1 except use the McAllister-Ianelli-2 method for both length and CAL data. • Francis / Francis-A: As for McAllister-Ianelli-1 except use Francis weighting for the length data and Francis-A weighting for the CAL data. • Francis / Francis-B: As for McAllister-Ianelli-1 except use Francis weighting for the length data and Francis-B weighting for the CAL data.

Results: One-zone operating model • The estimation model is not mis-specified so the correct effective sample sizes are known. This allows some questions about the “in principle” performance of the methods (and tuning algorithms) to be explored. • Does estimation performance depend on the initial weights? • Yes – results not shown here • Does estimation performance depend on the tuning algorithm? • Yes – results not shown here • Which method for calculating weights performs best?

The one-zone operating model McAllister-Ianelli-1 is biased for both length-frequency and conditional age-at-length data. McAllister-Ianelli-2 performs best at calculating effective sample sizes for length data (Francis is unbiased, but imprecise). McAllister-Ianelli-2 performs best at calculating effective samples for conditional age-at-length data (Francis-A and Francis-B are unbiased, but imprecise).

The two-zone operating model The untuned method performs poorer than when tuning is applied (except for when McAllister-Ianelli-1 is applied). McAllister-Ianelli-1 leads to the poorest performance. Francis / Francis-B leads to estimates with least bias for final spawning biomass (and final / initial spawning biomass), but not by much.

The two-zone operating model • With model-specification: • Francis leads to lower weights than McAllister-Ianelli-1 • Francis-B leads to lower weights than Francis-A and McAllister-Ianelli-2. • Francis and Francis-B are imprecise (compared to McAllister-Ianelli-2 and Francis-A). • We don’t know the correct effective sample size for this case.

Overall conclusions • General • Avoid McAllister-Ianelli-1 (averaging of effective sample sizes). • McAllister-Ianelli-2 (harmonic mean) performs adequately over all cases (but was not optimal when there was model mis-specification). • Francis / Francis-B was the least biased tuning algorithm, but the estimates of effective sample size showed the highest between-simulation variation

Questions & Acknowledgements This work was partially supported by NOAA grant NA10OAR4320148 Chris Francis is thanked for discussions that led to the Francis-A and Francis-B methods.

Some Insights into Data Weighting in Integrated Stock Assessments

Some Insights into Data Weighting in Integrated Stock Assessments

Presentation Transcript

Data Quality … a venture into some personal experiences and insights

Data Weighting Committee Report

Weighting

Integrating archival tag data into stock assessment models

Tables, Long data, Frequency weighting, String data

Custom Assessments in Data Warehouse

Turning Data Into Insights

INNOVATION: SOME INSIGHTS

Insights into Regulation

Weighting Your Data

Integrated assessments Integrated Indicators An integrated system??? Robin O’Malley

Insights into HDR Application in India

Mining Public Data for Insights into Human Disease

Integrated Performance Assessments

Insights into Climate Dynamics from Paleoclimate Data

DATA INSIGHTS

Turning Data into Insights - Driving Growth in the Manufactu

Some Useful Insights Allied To Stock Trading For Budding Traders

Some Insights Into Breast Cancer Treatment And Procedures

INNOVATION: SOME INSIGHTS

Data Weighting Issues