180 likes | 448 Vues
Early detection of excesses in infectious disease occurrence. Chris Robertson Richard Lawder. Objective. To construct a robust exception reporting system that takes into account reporting delay.
E N D
Early detection of excesses in infectious disease occurrence Chris Robertson Richard Lawder
Objective • To construct a robust exception reporting system that takes into account reporting delay. • Health Protection Scotland (HPS) have a national system ECOSS (Electronic Communication of Surveillance in Scotland) which holds all positive laboratory specimen results in Scotland that are significant for surveillance, which includes notifiable organisms as specified in the Public Health (Scotland) Act etc. 2008.
Delay • Problem is that there exists a reporting delay • Delay is different for different organisms • Delay is different in different years • Weekly delay • Delay = Week of DateFinalised– Week of SpecimenDate
Structure of Workspace R Workspace Output Log File R Code Files *.r Batch File to Run System Results Directory
Structure of Results Excel File with summary results For each organism and excel file of important summary results Pdf with graphs
R Code #Run the system library(surveillance) library(RODBC) library(XLConnect) load("HPSview_Stats.RData") #uncomment for batch processing channel <- odbcDriverConnect("DSN=Ecossstats;UID=StatsUser;PWD=Astra_sri") #odbcGetInfo(channel) #sqlTables(channel) #sqlColumns(channel,"HPSview#Stats") Data <- sqlFetch(channel, "HPSview#Stats", colnames = FALSE, rownames = TRUE) #read in the Data #Data <- read.csv("J:\\Statistics\\StatsSupport\\EcossOrganism\\HPSview_Stats_Full_Extract_(11032014).csv",as.is=T) #reads in a csv file - takes ages #Data <- read.spss("HPSview_Stats_Full_Extract_(08102013).sav",use.value.labels = TRUE, to.data.frame = TRUE, # max.value.labels = 20) #Need library(foreign) installed #names(Data)[1] <- "EcossID"
R Code Data$weekdate<- as.Date(Data$SpecimenDate,format="%Y-%m-%d") #NB CHANGE OF NAME Data$DateFinalised <- as.Date(Data$DateFinalised,format="%Y-%m-%d") Data$DateOB <- as.Date(Data$DateOB,format="%Y-%m-%d") Data$Age <- round(as.numeric((Data$weekdate - Data$DateOB)/365.25)) Data$Age[Data$Age < 0 & !is.na(Data$Age)] <- NA Data$Sex <- factor(ifelse(Data$Sex %in% c("f","F"),1,ifelse(Data$Sex %in% c("m","M"),2,3)),labels=c("F","M","U")) #drop dob Data <- Data[,!(names(Data) == "DateOB")] z <- subset(Data,weekdate > Sys.Date()) if (nrow(z)>=1) { print(z) Data <- subset(Data,weekdate <= Sys.Date()) }
R Code source("Functions.r") source("Print_Exceedance_Week_Data.r") source("Organism_Codes.r") #Organism.List is a list of the organisms and their codes to match with organism in Data source("Organism_Setup.r") #Organism.Setup is a data.frame of the organism codes and the values for delay distribution, and model fitting #Excess holds the excess for each organism over the last few weeks if (exists("Excess")) rm(Excess) if (exists("Too.Few")) rm(Too.Few) #Loop through the Organisms
R Code #Loop through the Organisms for (Organism.Name in names(Organism.List)) { Organism.Codes<- Organism.List[[Organism.Name]] #Org.SU holds the set up information for the organism. Items from this list are used in other scripts Org.SU <- subset(Organism.Setup,Name==Organism.Name) #get the data source("ecossextractv3.r") #get the weekly totals - in data frame z.df source("Get_Weekly_Data_Frame.r") z.delay.dist.df <- fun.get.delay.dist(analysis.df,Org.SU$Min.Year.Inc.Delay.Dist,Org.SU$Delay.Distribution.Cutoff, Type=Org.SU$Delay.Estimation.Type) #fit the model over a sequence of weeks - output is the data.framez.out source("Fit_Model.r") z.out$Excess <- ifelse(z.out$Observed > z.out$UCL.Adj,1,0) z.out$Too.Few <- ifelse(z.out$Observed < z.out$LCL.Adj,1,0)
Examples 95% of cases arrive at HPS within 2 weeks of collection of the sample . Campylobacter is a good example Many organisms have a longer delay Reporting delay has got better. From 2011 90% of episodes reported within 1 week. In 2015 nearly 40% reported within the same week sample was collected.
Examples Step Change in Reporting Delay in current year. Previously 60% reported within 1 week now 80% Predicted Regression is the reporting delay proportions used in the adjustments. This takes into accounts trends in improvements to reporting delay
Modelling • Based upon historical data • Up to 5 years • Takes into account • Seasonality • Trend • Also has to take into account the delay distribution
Illustration – Using Deaths To predict what is expected in week 1 2012 (blue) Historical data from same period in previous 5 years plus or minus 3 weeks (red) Fit trend (green) and predict Compare observed with Predicted Farrington Model
Routine Output 98% reported within 2 weeks 30% reported in the same week
Historic Data Brown – Trend Blue – Predicted Green – Upper limit of prediction Interval Light Blue – Lower limit of prediction Interval Red – when the observed count exceeds the upper limit. Grey – observed weekly counts
Period of Adjustments Upper Limit Observed Data Predicted Lower Limit
Period of Adjustments Above Upper Dotted lines No Correction for delay Solid lines Delay adjustment Upper Limit Observed Data Predicted Lower Limit Below Lower
Summary • Runs Every Monday – can be changed to run over weekend • Automatic