Richard Wollert, Ph.D. Diane Lytton, Ph.D. Jacqueline Waggoner, Ed.D. Marc Goulet, Ph.D.

Competent Use Of Actuarials Requires Understanding Sample-Wise Variations In Both Recidivism And Test Accuracy Richard Wollert, Ph.D. Diane Lytton, Ph.D. Jacqueline Waggoner, Ed.D. Marc Goulet, Ph.D. Available at http://richardwollert.com 2005 ATSA Convention, Nov 16-19, Salt Lake City

In A 2004 Article In Sexual Abuse, Doren Compared The 5-Year Score-Wise Recidivism Rates For The Construction Samples Of The RRASOR And Static-99 With A Number Of Generalization Samples. Notes on Abbreviations Score-wise Recidivism = SWR = Rate for a given point total Construction Sample = CS = Developmental Sample Generalization Sample = GS = A Comparison Sample 2005 ATSA Convention, Nov 16-19, Salt Lake City

The Purpose Of These Comparisons • “To discover the degree to which the risk percentages for each instrument score replicate across different samples and different underlying base rates” (p. 27). 2005 ATSA Convention, Nov 16-19, Salt Lake City

As A First Step In This Study, Many Data Sets Were Obtained From Different Sources • The data sets, or generalization samples, reported the number of recidivists and non-recidivists at each test score. 2005 ATSA Convention, Nov 16-19, Salt Lake City

Two Procedures Were Used To Combine The Data From the GSs • 1. Recidivism data for all GSs were pooled into a single “mega-sample” that was stratified by test scores. • Samples with BRs below that of the CS were not differentiated from samples with higher BRs. • 2. The data from the GSs were combined to form 8 “semi-overlapping groups” that varied in their overall recidivism rates (from about 6% to 29%). • These were also stratified by test scores. 2005 ATSA Convention, Nov 16-19, Salt Lake City

The Table Below Shows How GSs Were Combined To Form 8 “Semi-Overlapping” Groups 2005 ATSA Convention, Nov 16-19, Salt Lake City

The Data Were Analyzed Using Two Chi-Square Designs • 1. The recidivism rate for each test score in the CS was compared with the rate for the corresponding score that was derived from the mega-sample. • 2. The recidivism rate for each test score in the CS was compared with the rate for the corresponding score from each of the overlapping samples. 2005 ATSA Convention, Nov 16-19, Salt Lake City

Here Is A Format For Organizing RRASOR Data In The First Analysis. Two “Summary” Experience Tables Are Contrasted At Each Of Six Levels. 2005 ATSA Convention, Nov 16-19, Salt Lake City

Here Is A Format For Organizing RRASOR Data For The Second Analysis. Two Summary Experience Tables Were Again Contrasted. 2005 ATSA Convention, Nov 16-19, Salt Lake City

Seven Other Sets Of Tables Like The One Above Were Also Part Of The Second Analysis Because Data Were Combined To Make 8 Groups. Here Is The Last One. 2005 ATSA Convention, Nov 16-19, Salt Lake City

Findings • One significant difference in 13 tests was found when the SWR rates from the mega-sample were compared against those from the CS. • Relatively few significant differences were observed when the recidivism rates for the overlapping groups with overall base rates ranging from 9% to 21% were compared against those from the CS. 2005 ATSA Convention, Nov 16-19, Salt Lake City

A Number of Claims Were Based On These Patterns Of Non-significant Findings • Every 5-year SWR rate from each of the CSs was “replicated” (p. 33) in the GSs. • The SWR rates “remained essentially unchanged … through a range of plus or minus 6% around a center point” (p, 33). • For the RRASOR the center point was 13%. • For Static-99 it was 15%. 2005 ATSA Convention, Nov 16-19, Salt Lake City

Some Guidelines For Evaluators Who Administer The RRASOR and Static-99 Were Also Formulated On The Basis Of These Findings • When using the RRASOR, they can always assign the SWR rates for the CS because no meaningful differences in SWR rates were found between the CS and groups with differing BRs. • With Static-99 they should determine if an offender is from a parent population with a very high or low BR (because some differences were found in these regions). • It was recommended that the rate for the CS be assigned where the BR for the parent population ranges from 9-21%. 2005 ATSA Convention, Nov 16-19, Salt Lake City

The Author Also Claimed His Results Provided Empirical Evidence That SWR Rates Don’t Always Fluctuate When The BR For One Sample Differs From Another • In particular, he stated that “although it may have been believed that a sample’s underlying base rate could effect (sic) the interpretation of the actuarial instruments’ scores, that belief was found largely not supported (in my analysis) … the argument has become significantly weaker that an unknown sample recidivism base rate affects the interpretability of actuarial scores” (p. 34, Stability of the Interpretive Risk Percentages for the RRASOR and Static-99, 2004, Sexual Abuse,16, 25-36). 2005 ATSA Convention, Nov 16-19, Salt Lake City

Some Evaluators May Be Tempted To Justify SVP Civil Commitment Recommendations On The Basis Of This Article. Here Is One Possible Train Of Logic. • Defendant Jones has a high RRASOR score. • The BR for the parent population from which defendant Jones was drawn may be lower than that for the RRASOR CS. • Doren has shown that SWR rates for high RRASOR scores are the same even when the BR for one sample is lower than another. • The recidivism rate for high scorers in the CS sample is therefore applicable to Jones. 2005 ATSA Convention, Nov 16-19, Salt Lake City

Experts Who Rely On This Argument Run The Risk Of Providing Information That Is Misleading • Why? Because the research in question contains many methodological and conceptual flaws. • We will discuss only one of these flaws today, but we believe it is so fundamental and devastating that it invalidates the findings, conclusions, and interpretations reported in the article of concern. 2005 ATSA Convention, Nov 16-19, Salt Lake City

The Flaw Is This: The Original Research Question Was Posed Too Narrowly To Fully Address The Issue Of Replication • From the stated purpose and the article’s context, it is apparent that replication was conceived of as simply the stability of recidivism rates for each score over different summary experience tables. 2005 ATSA Convention, Nov 16-19, Salt Lake City

Score-Wise Recidivism Is Defined By A Math Formula, However. A Variation On Bayes’s Theorem, The Formula is E=PT/(PT+QF) • P = The base rate for those with test scores that fall within a specified range of scores. • The range could include all scores (Case “A”, scores 0-6+ on Static-99) or a subset of scores (Case “B”: scores of only 4-6+). • Q = The non-recidivism rate, which is always 1-P. • T = The true positive fraction: The % of recidivists with high scores in a specified range of scores. • Case A:# recidivists with 6+ scores/# recidivists with 0-6+ scores • Case B:# recidivists with 6+ scores/# recidivists with 4-6+ scores • F = The false positive fraction: The % of non- recidivists with high scores in a range of scores. 2005 ATSA Convention, Nov 16-19, Salt Lake City

Using Bayes’s Theorem To Calculate E For Case A (0-6+):E = (.180 x .256) / ((.180 x .256) + (.82 x .089)) = .39 2005 ATSA Convention, Nov 16-19, Salt Lake City

Using Bayes’s Theorem To Calculate E For Case B (4-6+): E = (.315 x .371) / ((.315 x .371) + (.685 x .286)) = .374 2005 ATSA Convention, Nov 16-19, Salt Lake City

Several Principles May Be Deduced From E = PT / (PT + QF) • 1. Each score-wise rate reported in a summary experience table is the product of several variables (P, T, and F) that constitute an underlying (and rarely disseminated) “component” experience table. • 2. Samples may have similar score-wise recidivism rates, but differ with respect to P, T, or F (see slide 22). • 3. A score-wise recidivism rate is truly replicated only when the associated values of P, T, and F from different experience tables are replicated (also see slide 22). 2005 ATSA Convention, Nov 16-19, Salt Lake City

Variations In P, T, and F May Be Found For Samples With Similar SWR Rates: An Example Using Static-99 Data (Note: E Is Obtained By Applying Bayes’s Theorem) 2005 ATSA Convention, Nov 16-19, Salt Lake City

Other Principles • 4. If T and F are stable, the recidivism rate will change only if P changes. • 5. If P and Q are stable, the score-wise recidivism rate will change as a function of changes in the “likelihood” ratio of T/F. 2005 ATSA Convention, Nov 16-19, Salt Lake City

These Mathematical Facts Hold Important Implications For The Research Being Analyzed • Recall that the author concluded that the score-wise recidivism rates for samples with different overall recidivism rates did not differ from one another. • Assuming that acceptance of the null hypothesis is justified, this can mean only one thing. • The likelihood (T/F) ratio changed from one sample to another. • Mossman, who has published many articles on ROC analysis and Bayes’s theorem, made the same point about Doren’s research in an article that has been accepted for publication in Sexual Abuse. 2005 ATSA Convention, Nov 16-19, Salt Lake City

We Tested This Hypothesis After Obtaining The Frequency Data Analyzed In The Original Study • Adopting 5 as a high score on the RRASOR, likelihood ratios were calculated for the construction sample and for all generalization samples where this was possible. • It was impossible to define LRs for 3 of 10 samples. • Adopting 6 as a high score, equivalent calculations were undertaken for the Static-99 construction sample and for all generalization samples. 2005 ATSA Convention, Nov 16-19, Salt Lake City

Other Steps Of The Re-analysis • Upper and lower confidence intervals (p=.05) were established for the LRs from the RRASOR and Static-99. • The LRs for the generalization samples were plotted against the confidence intervals for the construction sample. • Data for other scores were not analyzed because recidivism rates for lower scores were correlated with recidivism rates for maximum scores. 2005 ATSA Convention, Nov 16-19, Salt Lake City

All Likelihood Ratios From The RRASOR Generalization Samples Were Significantly Different From The Likelihood Ratio For The RRASOR Construction Sample 2005 ATSA Convention, Nov 16-19, Salt Lake City

The Likelihood Ratios In 6 Of 7 Generalization Samples Were Significantly Different From The Likelihood Ratio For The Static-99 Construction Sample 2005 ATSA Convention, Nov 16-19, Salt Lake City

Correlational Analyses Indicated That Test Accuracy Decreased As Base Rates Increased RRASOR LRs with sample-wise base rates: -.52 (n = 8; p = .17). Static-99 LRs with sample-wise base rates: -.86 (n = 8; p < .01) 2005 ATSA Convention, Nov 16-19, Salt Lake City

Implications Of This Re-analysis For The Research Under Consideration • Score-wise recidivism rates were not replicated in the criticized research because similarities in rates were an artifact of fluctuations in likelihood ratios. • Characterizing the principle that score-wise recidivism rates vary with base rate differences as a “belief” is misleading. As long as F and T are stable, it is a mathematical fact. • Proposing guidelines for evaluators to follow that conflict with Bayes’s theorem is potentially harmful because of the increase in prediction errors that this may occasion. 2005 ATSA Convention, Nov 16-19, Salt Lake City

Practice Implications • Variations in detection indicia and base rates raise doubts about the applicability of published SWR rates for RRASOR and Static-99 to local populations. • Agencies that use these tests should consider re-norming them on local populations. • The correlational analyses suggest that these tests are most inaccurate for populations that are of greatest concern because of their high recidivism rates. • When using these tests, examiners should disclose their assumptions about P, T, and F, and present data that support their assumptions. 2005 ATSA Convention, Nov 16-19, Salt Lake City

Research Implications • Current data on representative and large samples would facilitate meaningful replication research. • Test developers might improve accuracy by investigating factors that produce fluctuations in likelihood ratios. • Why is accuracy so diminished in groups with high base rates? • Component experience tables should be compiled to accompany summary tables. These tables should include frequency data for true positives, true negatives, false positives, and false negatives. Associated Bayesian values should also be included. Each table should describe subjects and sampling methods. 2005 ATSA Convention, Nov 16-19, Salt Lake City

Component Experience Tables Should Be Compiled To Accompany Summary Experience Tables 2005 ATSA Convention, Nov 16-19, Salt Lake City

Richard Wollert, Ph.D. Diane Lytton, Ph.D. Jacqueline Waggoner, Ed.D. Marc Goulet, Ph.D.

Richard Wollert, Ph.D. Diane Lytton, Ph.D. Jacqueline Waggoner, Ed.D. Marc Goulet, Ph.D.

Presentation Transcript

Richard C. Josiassen Ph.D.

. Ph.D.

Pilar Mendoza, Ed.D . Zaria Malcolm, Ph.D. Nancy Parish, Ph.D. Candidate

Geeta Verma, Ph.D. University of Colorado Denver Jacqueline Leonard, Ph.D. Ana Houseal , Ph.D.

Diane Holtzman, Ed.D. Evonne Kruger, Ph.D. The Richard Stockton College of New Jersey

Frank Ezzo, Ph.D. A.Rodney Nurse, Ph.D. Irene Goldenberg, Ed.D. Nathan Turner, Ed.D.

Yuhua Bao, Ph.D. † , Sarah Fox, Ed.D. † , Jose Escarce, M.D., Ph.D. ‡

Roberto Trevino, Ph.D. Alan Jay Richard, Ph.D. Diana Lemos, B.S.

Dennis Doverspike, Ph.D. Jay C. Thomas, Ph.D. Steven Marshall, Ph.D. William Amberg, Ed.D.

Richard L. Smailes, Ph.D.

COUN 575 Diane Shea, Ph.D.

Shenyang Guo, Ph.D.¹, Richard Barth, Ph.D. ¹, and Claire Gibbons, MPH ²

Barry Anton, Ph.D., ABPP Richard P. Barth, MSW, Ph.D. Mary Ann McCabe, Ph.D. Don Wertlieb, Ph.D.

Professor Richard Holt, Ph.D.

Janice N. Tolk, Ph.D., P.E. Richard S. Hartley, Ph.D., P.E.

Kamlesh (Kam) Lulla, Ph.D.;Ph.D.

Contributors: Donald Gainey, Ed.D Nancy Maldonado, Ph.D Steve Thompson, Ph.D

Contributors: Donald Gainey, Ed.D Nancy Maldonado, Ph.D Steve Thompson, Ph.D