1 / 70

Space-Time Scan Statistics for Early Warning Systems

Space-Time Scan Statistics for Early Warning Systems. Martin Kulldorff Department of Ambulatory Care and Prevention Harvard University Medical School and Harvard Pilgrim Health Care. Content. Background on Disease Surveillance

zoey
Télécharger la présentation

Space-Time Scan Statistics for Early Warning Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Space-Time Scan Statistics for Early Warning Systems Martin Kulldorff Department of Ambulatory Care and Prevention Harvard University Medical School and Harvard Pilgrim Health Care

  2. Content • Background on Disease Surveillance • Purely Spatial Scan Statistics: Brain Cancer in the United States • Early Warning System using a Space-Time Permutation Scan Statistic: Syndromic Surveillance in New York City • Various Extensions

  3. Collaborators Harvard Medical School: Ken Kleinman, Richard Platt, Katherine Yih New York City Dep Health: Jessica Hartman, Rick Heffernan, Farzad Mostashari University of Connecticut: David Gregorio, Zixing Fang Universidad Federal Minais Gerais: Renato Assunção, Luiz Duczmal

  4. Importance of Early Disease Outbreak Detection • Eliminate health hazards • Warn about risk factors • Earlier diagnosis of new cases • Quarantine cases • Scientific research concerning treatments, vaccines, etc. • Early detection is especially critical for infectious diseases

  5. Data Sources Disease Registries Reportable Diseases Electronic Health Records Health Insurance Claims Data Vital Statistics (Mortality) Types of Data Diagnosed Diseases Symptoms (Syndromic Surveillance) Lab Test Results Pharmaceutical Drug Sales Disease Surveillance

  6. Disease Surveillance Frequency of Analyses • Daily • Weekly • Monthly • Yearly

  7. Purely Temporal Methods Farrington CP, Andrews NJ, Beale AD, Catchpole MA (1996) A statistical algorithm for the early detection of outbreaks of infectious disease. J R Stat Soc A Stat Soc 159: 547–563. Hutwagner LC, Maloney EK, Bean NH, Slutsker L, Martin SM (1997) Using laboratory-based surveillance data for prevention: An algorithm for detecting salmonella outbreaks. Emerg Infect Dis 3: 395–400. Nobre FF, Stroup DF (1994) A monitoring system to detect changes in public health surveillance data. Int J Epidemiol 23: 408–418. Reis B, Mandl K (2003) Time series modeling for syndromic surveillance. BMC Med Inform Decis Mak 3: 2.

  8. Three Important Issues An outbreak may start locally. Purely temporal methods can be used simultaneously for multiple geographical areas, but that leads to multiple testing. Disease outbreaks may not conform to the pre-specified geographical areas.

  9. Why Use a Scan Statistic? With disease outbreaks: • We do not know where they will occur. • We do not know their geographical size. • We do not know when they will occur. • We do not know how rapidly they will emerge.

  10. One-Dimensional Scan Statistic

  11. The Spatial Scan Statistic Create a regular or irregular grid of centroids covering the whole study region. Create an infinite number of circles around each centroid, with the radius anywhere from zero up to a maximum so that at most 50 percent of the population is included.

  12. For each circle: • Obtain actual and expected number of cases inside and outside the circle. • Calculate likelihood function. • Compare Circles: • Pick circle with highest likelihood function as Most Likely Cluster. • Inference: • Generate random replicas of the data set under the null-hypothesis of no clusters (Monte Carlo sampling). • Compare most likely clusters in real and random data sets (Likelihood ratio test).

  13. Poisson Likelihood Function [c / μ ]cx [(C-c)/(C- μ)] C-c c=cases in circle μ = expected cases in circle C = total cases

  14. Spatial Scan Statistic: Properties • Adjusts for inhomogeneous population density. • Simultaneously tests for clusters of any size and any location, by using circular windows with continuously variable radius. • Accounts for multiple testing. • Possibility to include confounding variables, such as age, sex or socio-economic variables. • Aggregated or non-aggregated data (states, counties, census tracts, block groups, households, individuals).

  15. U.S. Brain Cancer Mortality1986-1995 deaths rate* (95% CI) Children (age <20): 5,062 0.75 (0.66-0.83) Adults (age 20+): 106,710 6.0 (5.8-6.2) Adult Women: 48,650 4.9 (4.7-5.0) Adult Men: 58,060 7.2 (7.0-7.5) * annual deaths / 100,000

  16. Brain Cancer • Known risk factors: • High dose ionizing radiation • Selected congenital and genetic disorders • Explains only a small percent of cases. • Potential risk factors: • N-nitroso compounds?, phenols?, pesticides?, polycyclic aromatic hydrocarbons?, organic solvents?

  17. Adjustments All subsequent analyses where adjusted for: • Age • Gender • Ethnicity (African-American, White, Other)

  18. Brain Cancer Mortality, Children 1986-1995

  19. Spatial Scan Statistic, Children

  20. Children: Seven Most Likely Clusters Cluster Obs Exp RR p= 1. Carolinas 86 51 1.7 0.24 2. California 16 4.9 3.3 0.74 3. Michigan 318 250 1.3 0.74 4. S Carolina 24 10 2.5 0.79 5. Kentucky-Tenn 127 88 1.4 0.79 6. Wisconsin 10 2.4 4.1 0.98 7. Nebraska 12 3.6 3.3 0.99

  21. Conclusions: Children No statistically significant clusters detected. Any part of the pattern seen on the original map may be due to chance.

  22. What About Adults?

  23. Brain Cancer Mortality, Adults 1986-1995

  24. Spatial Scan Statistic: Adults

  25. Spatial Scan Statistic, Women

  26. Women: Most Likely Clusters Cluster Obs Exp RR p= 1. Arkansas et al. 2830 2328 1.22 0.0001 2. Carolinas 1783 1518 1.17 0.0001 3. Oklahoma et al. 1709 1496 1.14 0.003 4. Minnesota et al. 2616 2369 1.10 0.01 10. N.J. / N.Y. 1809 2300 0.79 0.0001 11. S Texas 127 214 0.59 0.0001 12. New Mexico et al. 849 1049 0.81 0.0001

  27. Spatial Scan Statistic: Men

  28. Men: Most Likely Clusters Cluster Obs Exp RR p= 1. Kentucky et al. 3295 2860 1.15 0.0001 2. Carolinas 1925 1658 1.16 0.0001 3. Arkansas et al. 1143 964 1.19 0.001 4. Washington et al. 1664 1455 1.14 0.003 5. Michigan 1251 1074 1.17 0.005 11. N.J. / N.Y. 2084 2615 0.80 0.0001 12. S Texas 157 262 0.60 0.0001 13. New Mexico et al. 1418 1680 0.84 0.0001 14. Upstate N.Y. et al. 1642 1895 0.87 0.0001

  29. Conclusions: Adults It is possible to pinpoint specific areas with higher and lower rates that are statistically significant, and unlikely to be due to chance. The exact borders of detected clusters are uncertain. Similar patterns for men and women.

  30. Conclusion: General The spatial scan statistic can be useful as an addition to disease maps, in order to determine if the observed patterns are likely due to chance or not. A complement rather than a replacement for regular disease maps.

  31. Space-Time Scan Statistic Use a cylindrical window, with the circular base representing space and the height representing time. We will only consider cylinders that reach the present time.

  32. For each cylinder: • Obtain actual and expected number of cases inside and outside the cylinder. • Calculate likelihood function. • Compare Cylinders: • Pick cylinder with highest likelihood function as Most Likely Cluster. • Inference: • Generate random replicas of the data set under the null-hypothesis of no clusters (Monte Carlo sampling). • Compare most likely clusters in real and random data sets (Likelihood ratio test).

  33. For each cylinder: • Obtain actual and expected number of cases inside and outside the cylinder. • Calculate likelihood function. • Compare Cylinders: • Pick cylinder with highest likelihood function as Most Likely Cluster. • Inference: • Generate random replicas of the data set under the null-hypothesis of no clusters (Monte Carlo sampling). • Compare most likely clusters in real and random data sets (Likelihood ratio test).

  34. Space-Time Permutation Scan Statistic 1. For each cylinder, calculate the expected number of cases conditioning on the marginals μst = Σscst xΣtcst / C where cst = # cases at time t in location s and C = total number of cases

  35. Space-Time Permutation Scan Statistic 2. For each cylinder, calculate Tst = [cst / μst ]cstx [(C-cst)/(C- μst)] C-cst if cst > μst = 1, otherwise 3. Test statistic T = maxst Tst

  36. Space-Time Permutation Scan Statistic 4. Generate random replicas of the data set conditioned on the marginals, by permuting the pairs of spatial locations and times. 5. Compare test statistic in real and random data sets using Monte Carlo hypothesis testing (Dwass, 1957): p = rank(Treal) / (1+#replicas)

  37. Space-Time Permutation Scan Statistic: Properties • Adjusts for purely geographical clusters. • Adjusts for purely temporal clusters. • Simultaneously tests for outbreaks of any size at any location, by using a cylindrical windows with variable radius and height. • Accounts for multiple testing. • Aggregated or non-aggregated data (counties, zip-code areas, census tracts, individuals, etc).

  38. Let’s Try It! • Historic data, Nov 15, 2001 – Nov 14, 2002 • Diarrhea, all age groups • Use last 30 days of data. • Temporal window size: 1-7 days • Spatial window size: 0-5 kilometers • Residential zip code and hospital coordinates

  39. Results: Hospital Analyses Date #days #hosp #cases #exp RR p= recurrence interval A Nov 21 6 1 101 73.6 1.4 0.0008 1 / 3.4 years B Jan 11 1 1 10 2.3 4.4 0.0007 1 / 3.9 years C Feb 26 4 2 97 66.9 1.4 0.0018 1 / 1.5 years D Mar 31 2 1 38 19.2 2.0 0.0017 1 / 1.6 years E Nov 1 6 3 122 86.6 1.4 0.0017 1 / 1.6 years F Nov 2 7 3 135 98.3 1.4 0.0008 1 / 3.4 years

  40. Results: Residential Analyses reccurence Date #days #zips #cases #exp RR p= interval G Feb 9 2 15 63 34.7 1.8 0.0005 1 / 5.5 years H Mar 7 2 8 63 37.3 1.7 0.0027 1 / 1.0 years

  41. Real-Time Daily Analyses • Starting November 1, 2003. • Respiratory, Fever/Flu, Diarrhea, (+Vomiting) • Hospital (and Residential) Analyses • Spatial window size: 0-5 kilometers • Temporal window size: 1-7 days

  42. Real-Time Results, Nov 24, 2003: Hospital Analysis Syndrome #days #hosp #cases #exp RR p= recurrence interval Respiratory 2 3 80 57.4 1.4 0.13 every 8 days Fever/Flu 3 1 24 14.8 1.6 0.68 every day Diarrhea 2 4 18 8.2 2.2 0.04 every 26 days

  43. Real-Time Results, Nov 25, 2003: Hospital Analysis Syndrome #days #hosp #cases #exp RR p= recurrence interval Respiratory 7 1 45 30.4 1.5 0.46 every 2 days Fever/Flu 1 5 50 31.5 1.6 0.04 every 23 days Diarrhea 3 4 22 11.5 1.9 0.17 every 6 days

  44. Real-Time Results, Nov 26, 2003: Hospital Analysis Syndrome #days #hosp #cases #exp RR p= recurrence interval Respiratory 5 2 233 199.4 1.1 0.63 every 2 days Fever/Flu 7 7 299 252.1 1.2 0.05 every 22 days Diarrhea 4 4 23 12.6 1.8 0.22 every 5 days

  45. Real-Time Results, Nov 27, 2003: Hospital Analysis Syndrome #days #hosp #cases #exp RR p= recurrence interval Respiratory 1 4 41 26.9 1.5 0.45 every 2 days Fever/Flu 6 4 181 142.9 1.3 0.03 every 36 days Diarrhea 5 3 29 14.1 1.7 0.50 every 2 days

  46. Real-Time Results, Nov 28, 2003: Hospital Analysis Syndrome #days #hosp #cases #exp RR p= recurrence interval Respiratory 2 4 98 78.8 1.2 0.82 every day Fever/Flu 7 5 228 178.0 1.3 0.001 every 1000 days Diarrhea 6 3 29 17.5 1.5 0.26 every 4 days

More Related