Clinical-Data Acceptance Testing Procedure
E N D
Presentation Transcript
Clinical-Data Acceptance Testing Procedure Sunil Gupta Quintiles
What are examples of bad data? • Duplicate records exist • Missing values in required variables • Start dates are after stop dates • Invalid value for variable • Values are inconsistent across other variables or other data sets • Etc.. Across all industries, bad data costs companies more than $ 600 billion per year. For clinical trials, up to 5 % of raw data values may initially be erroneous.
Sources of Data Quality Problems across all Industries 2001 survey by the Data Warehousing Institute Check Data: Each Data Transfer, Data Conversion or Data Updates
What are Clinical Data Issues? • Incorrect or Incomplete clinical data • Critical variables may need to be: • Non-missing • Consist only of valid values • Be within a range • Or be consistent with other variables. • Process using Edit Check Macros • Unit Testing • Universal principles 21 CFR Part 11 Prevent confusion and frustration. Prevent incorrect clinical study conclusion: Safety, Efficacy
Benefits of High Quality Data across all Industries 2001 survey by the Data Warehousing Institute About $20 to $25 per case report form page or up to 15% of clinical research budget may be spent to ensure data quality.
Clinical Data Process Flow Raw Data Demog: Valid/Invalid Data Vitals: Valid/Invalid Data Labs: Valid/Invalid Data AE: Valid/Invalid Data As a result, in general: - Keep invalid data in data sets - Exclude invalid data in reports Edit Check Process 1. Identify Invalid Data based on DMP 2. Isolate Data Issue 3. Communicate Finding to CDM Outcome 1. MONTHLY: Monitor Improvements in Invalid Data with e-mail notification to all team members 2. FINAL: Use Valid Data in Analysis data sets, Tables, Lists and Graphs
What is Data Acceptance Testing • Create Data Management Plan • Generally a monthly process to refresh data • Identify, isolate and report clinical data issues • Make critical decision before database lock to accept or reject database Similar to User Acceptance Testing for software application. Data
Solution involves four steps beforeDatabase Lock • 1. Specifying Requirements in Data Management Plan (DMP) • 2. Developing and Testing Edit Check Macros • 3. Communicating Results with Clinical Data Management (CDM) (Partnership) • 4. Monitoring the Metrics of Data Issues Using Edit Check Macros standardized our approach to validate the quality of raw clinical data.
Specifying Requirements in Data Management Plan (DMP) • All unique key variables are required. • Patient ID variable is non-missing and unique. • Confirm minimum and maximum values. • Vitals data set: valid temperature and blood pressure values within lower and upper range values. • Display all unique values of selected variables. • Demog data set: valid treatment (active, placebo). continued … At a minimum, these types of data checks should be performed.
Specifying Requirements in Data Management Plan (DMP) 4. Confirm the logic between two variables. • Adverse Events data set: adverse event description, preferred term and system organ class need to be consistent. 5. Confirm the consistency between two clinical dates. • Adverse Events data set: Adverse start dates before or same day as adverse stop dates. continued … At a minimum, these types of data checks should be performed.
Specifying Requirements in Data Management Plan (DMP) • Are there any protocol violations that should be excluded from analysis? • Are the treatment groups randomly distributed based on safety subset population? • Have lab values been correctly converted from reported units to standard international units? • Are patient follow-up visit windows in compliance with the protocol? Ideally, these types of standard data checks should also be performed.
Developing and Testing Edit Check Macros • System Requirements • Unselect data checks • Easily modify data checks, Add new data checks • Display ‘No records found’ for no data issues • Display feedback from CDM on data issues • Limited Programming Resources • SAS’s ODS, Minimum SAS macro programming • Simple, task-oriented macros approach • Apply standard options to selected SAS Procedures A traditionally lengthy SAS program of over 1,000 lines is easier to read with only 75 lines containing 75 edit check macro calls. continued …
Edit Check Macros: Functional Requirements • Macros use basic macro programming techniques that are easy to understand • Quick development of new macros • Quick enhancements to existing macros • Macros provide informative feedback in titles • Input data set name • Variables checked • Any subset condition applied Data set name, variables checked and subset condition are important information for CDM to first confirm the data issue before taking any action. continued …
Edit Check Macros: Functional Requirements • Macros provide reference information in footnotes • Program name • Output file name • Date executed • Macros display data issues • Patient and visit identification • Data values of variable checked • Supporting variables (if any) • One data issue/page • Findings saved to one RTF file
One Edit Check Macro for each type of data issue (partial list) * May require extra programming step since most all edit check macros require single data set.
Scope of Edit Check Macros Involved Multiple Procedures %crt_comp (Proc Sort, Data Step, Proc Compare, etc.) %u_ecdup (5 SAS Procedures, 2 Data Steps ) Simple Single Procedure %u_ecprnt (Proc Print) %u_ecfreq (Proc Freq) %u_ecmens (Proc Means)
Basic Edit Check Macro call with standard macro parameters %let study = proto1; < libref of data sets – raw or analysis > %edit_check_macro( edsn = &study.XXX < data set name with optional data set option or if used with a SAS procedure then a SAS procedure option >, ecvr = < one or more edit check variables >, byvars = < patient for by-group processing >, ecvrl = < one or more extra supporting variables >, esub = < subset condition using “where statement” >, fnote = < user defined message to understand data check > );
SAS code for %u_ecfreq Edit Check Macro %macro u_ecfreq( edsn = &study.ptinfo, /* data set name */ ecvr = %str(_all_), /* variable or list of variables, can also pass cross tables with options */ esub = , /* subset where condition, can also be a format statement */ fnote = /* optional footnotes */ ); continued …
SAS code for %u_ecfreq Edit Check Macro title1 "Study: &study - Edit Check in &edsn dataset"; title2 "PROC FREQ of %upcase(&ecvr) variable with the condition: &esub"; %if %length(&fnote) > 0 %then %do; %if %index(&fnote,\) = 0 %then footnote3 &fnote.;; %if %index(&fnote,\) > 0 %then %do; %let fnote3 = %scan(&fnote,1,\); footnote3 &fnote3; %let fnote4 = %scan(&fnote,2,\); footnote4 &fnote4; %end; %end; continued …
SAS code for %u_ecfreq Edit Check Macro proc freq data = &edsn; tables &ecvr / list missing nocum; &esub; run; footnote3; %mend u_ecfreq; Temporary or permanent, raw or derived data set, with data set options Single or multiple crossed variables with standard statement options Independent single or multiple SAS statements, generally a WHERE statement to apply subset condition Edit Check macro parameters are directly applied to PROC FREQ.
%u_ecfreq Edit Check Macro call %u_ecfreq( edsn = &study..ptinfo, ecvr = sex race, esub =, fnote = %str(Note to CDM: Confirm valid gender \Note to CDM: Confirm valid race)); Aside from the footnote, the %u_ecfreq edit check macro call takes only 1 SAS line instead of 16 SAS lines.
Communicating Results with Clinical Data Management (CDM) - Output file Study: proto1 - Edit Check in proto1.ptinfo dataset PROC FREQ of SEX RACE variable with the condition: The FREQ Procedure sex Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Female 4 40.00 Mal 2 20.00 Male 4 40.00 Key Info Only continued … Macro parameters are directly applied to titles.
Communicating Results with Clinical Data Management (CDM) - Output file race Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Asian 1 10.00 Black 1 10.00 White 7 70.00 X 1 10.00 Program: t_data_acceptance_checks.sas Output: edit_check_results.rtf 12MAY2007 Note to CDM: Confirm valid gender Note to CDM: Confirm valid race
%crt_comp Edit Check Macro call %crt_comp( ec1dir =, /* prefix study directory data set 1 */ ec1dsn =, /* data set name 1 */ ec1var =, /* variable names from data set 1 */ ec2dir =, /* prefix study directory data set 2 */ ec2dsn =, /* data set name 2 */ eco2dsn =, /* output data set name 2 */ ecr2dsn = 0, /* remove duplicate records 0 = no, 1 = yes */ ec2var =, /* variable names from data set 2 */ ecvrlt =, /* common variable names to compare or _all_ */ ecid =, /* matching variable name for data set 1 */ ecid2 =, /* matching variable name for data set 2 */ ec1wst =, /* optional where condition on data set 1 */ ec2wst =, /* optional where condition on data set 2 */ ec1opt =, /* optional option condition on data set 1 */ ec2opt =, /* optional option condition on data set 2 */ cmpopt = %str(method=exact) /* proc compare method option */ );
%crt_comp Edit Check Macro call %crt_comp( ec1dir = a, ec1dsn = ptinfo, ecid = patient, ec2dir = a, ec2dsn = mhist, ecid2 = patient); Simple, one SAS line macro call provides comprehensive PROC COMPARE results to automatically identify and compare common variables between ptinfo and mhist data sets.
%crt_comp Edit Check Macro – Output file Study: proto1 - Edit Check in proto1.ptinfo dataset Compare aproto1: BASE Dataset ptinfo with aproto1: COMPARE Dataset mhist Compare common variables Column Name ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ patient age sex race ... Automatic %crt_comp macro assumes that common variables have same variable names. No hard coding of variable names is required.
%u_ecdup Edit Check Macro call * Check for duplicate records by all variables; %u_ecdup( edsn = &study..ptinfo, /* Data set name */ ecvr = _all_, /* Error check variables, default is all*/ ecvrl =, /* Extra variables to be kept in the output dataset */ fnote =, /* Optional footnote */ eoutdsn = /* Output dataset */ );
Communicating Results with Clinical Data Management (CDM) - Output file Study: proto1 - Edit Check in proto1.ptinfo dataset Indicate the result of checking for duplicated records Error Code No duplicated records found for _ALL_ Program: t_data_acceptance_checks.sas Output: edit_check_results.rtf 12MAY2007
Communicating Results with Clinical Data Management (CDM) - Output file * Check for duplicate records by key variables; %u_ecdup(edsn = &study..ptinfo, ecvr = patient); Study: proto1 - Edit Check in proto1.ptinfo dataset (Results on next page) Indicate the result of checking for duplicated records Error Code No duplicated records found for PATIENT Program: t_data_acceptance_checks.sas Output: edit_check_results.rtf 12MAY2007
%u_eccust Edit Check Macro call %u_eccust( edsn = c990736.vlabs, /* Dataset name */ ecvr = labdt, /* Variable for edit check*/ ecst = %str(if labnme = 'wbc' and not ((.1 < labv < 100))), /* Actual IF condition statement with NOT built in – expect zero records with clean data */ titles =, /* Extra Title */ fnote =, /* Optional footnote */ ); %u_eccust macro allows for almost any customized data check using a valid single ‘IF’ statement which is applied in a DATA step.
%u_eccust Edit Check Macro call %u_eccust( edsn = h&study..hematol, ecvr = hehgb hehgbf hehgbl hehgbh, ecst = if nmiss(hehgb, hehgbl, hehgbh) = 0 and ( (hehgb < hehgbl and hehgbf ne 'L' ) or (hehgb > hehgbh and hehgbf ne 'H' ) or (hehgbl <= hehgb <= hehgbh and hehgbf ne 'N' ) ) );
Flexibility with Edit Check Macros Parameters to Edit Check macros can be macro variables. - Can group macros by data set, variable, etc. - Can apply conditional execution based on non-zero obs %macro ec_run(dsetn =, whrcon =); %u_ecfreq(edsn= &dsetn,ecvr= race,esub= &whrcon); %u_ecfreq(edsn= &dsetn,ecvr= age,esub= &whrcon); %negval(ngdst= &dsetn,ngvr= age,ngwhr= &whrcon); %mend ec_run; %ec_run(dsetn= ptinfo, whrcon= %str(itt = 1));
Making multiple Edit Check Macro calls %u_ecfreq(edsn= &study..hematol, ecvr= hewbc, esub= where hewbc <= .); %u_ecfreq(edsn=&study..hematol, ecvr= heneut, esub= where heneut <= .); %u_ecfreq(edsn=&study..hematol, ecvr= hesegs, esub= where hesegs <= .); %u_ecfreq(edsn=&study..hematol, ecvr= heband, esub= where heband <= .); %u_ecfreq(edsn=&study..hematol, ecvr= hegran, esub= where hegran <= .); %u_ecfreq(edsn=&study..hematol, ecvr= hebaso, esub= where hebaso <= .); One SAS line macro call for each lab test instead of 14 SAS lines. Focus is generating output instead of writing SAS code.
Making multiple Edit Check Macro calls %*u_ecfreq(edsn= &study..hematol, ecvr= hewbc, esub= where hewbc <= .); %*u_ecfreq(edsn= &study..hematol, ecvr= heneut, esub= where heneut <= .); %*u_ecfreq(edsn= &study..hematol, ecvr= hesegs, esub= where hesegs <= .); %u_ecfreq(edsn= &study..hematol, ecvr= heband, esub= where heband <= .); %u_ecfreq(edsn= &study..hematol, ecvr= hegran, esub= where hegran <= .); %u_ecfreq(edsn= &study..hematol, ecvr= hebaso, esub= where hebaso <= .);
Communicating Results with Clinical Data Management (CDM) • Provide all important information: • Data set name, Patient Number, Visit Date, Variables checked, Supporting Variables, Subset condition • One page for each data issue • Confirm data issue exists in raw data sets • Identify discrepancy between biostats expectations and CDM’s service • CDM updates database design or new variable calculation as needed • CDM updates/adds CDM data checks
SOPs on Clinical Data Acceptance Testing • Data Standards: SOPs on accessing and writing to data sets • Input – Reading and archiving of data from various sources • Mapping – Documented system to convert data values • Output – Structuring and archiving of output data • Data Quality: SOPs on identify and resolving data issues • Standard Edit Checks (ex. Duplicate records) • Edit Checks based on Data Management Plan (DMP) • Cleaning: Screening of all data sets, all variables and all records (Some fails may be acceptable) • CDAT: Thorough checking of all critical (primary endpoint) variables (No fail is acceptable) • Query Resolution – Feedback loop to monitor data improvement
Metrics on Clinical Data Acceptance Testing • Unit: # of edit checks tested (based on DMP) • Summary level measurement (Scope of issue – at least one failed patient out of x edit checks) = # of failed edit checks/Total # of edit checks • Detail level measurement (Impact of issue – systematic or localized problem?) • Overall = # of failed records/(Total # of records x Total # of edit checks) • By edit check = # of failed records/Total # of records Goal: To capture and monitor the correction of unexpected data.
1. Ptno is a required variable – non-missing. %u_ecprnt(edsn= work.ptinfo (obs=5), ecvr= age sex race); Study: Test - Edit Check in work.ptinfo dataset PROC PRINT of age sex race variables, all observations Obs ptno age sex race 1 . 35 Mal X 2 102 0 Mal White 3 105 -2 Female White 4 107 30 Male Black 5 110 31 Male White
2. Age range acceptable (1 - 100)? %u_ecmens(edsn = work.ptinfo, ecvr = age); Study: Test - Edit Check in work.ptinfo dataset PROC MEANS of AGE variable with the condition: The MEANS Procedure Analysis Variable : age N Mean Std Dev ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 10 28.6000000 17.2832610 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ N Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 10 -2.0000000 54.0000000 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
2. Age range acceptable (1 - 100)? proc format; value agegrp 1 - 100 = 'Acceptable'; run; %u_ecfreq(edsn= work.ptinfo, ecvr= age, esub= %str(format age agegrp.)); Study: Test - Edit Check in work.ptinfo dataset PROC FREQ of AGE variable with the condition: format age agegrp. The FREQ Procedure age Frequency Percent ------------------------------------------ -2 1 10.00 0 1 10.00 Acceptable 8 80.00
3. Sex values acceptable (Male, Female)? %u_ecfreq(edsn= work.ptinfo, ecvr= sex); Study: Test - Edit Check in work.ptinfo dataset PROC FREQ of SEX variable with the condition: The FREQ Procedure sex Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Female 4 40.00 Mal 2 20.00 Male 4 40.00
4. Race values acceptable (Asian, Black, White)? %u_ecfreq(edsn= work.ptinfo, ecvr= race); Study: Test - Edit Check in work.ptinfo dataset PROC FREQ of RACE variable with the condition: The FREQ Procedure race Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Asian 1 10.00 Black 1 10.00 White 7 70.00 X 1 10.00
5. Do duplicate records exist? %u_ecdup(edsn= work.ptinfo, ecvr= ptno); Error Code No duplicated records found for PTNO
Metrics on Clinical Data Acceptance Testing • Unit: 5 edit checks tested • Summary level measurement (Scope) 80% = 4 failed edit checks/5 edit checks • Detail level measurement (Impact) • Overall 12% = 6 failed records/(10 records x 5 edit checks) • By edit check: 1. Patno (10%), 2. Age (20%), 3. Sex (20%), 4. Race (10%), 5. Duplicates (0%) Maybe fail Data Acceptance Test since Patno is a unique critical variable?
Possible Reasons for Poor Clinical Data Summary: High Low Detail: Low High
Summary: What the Edit Check MacrosAre Not Designed to do • Totally replace traditional SAS programming in validation programs; they are designed to supplement them • Any data manipulation using the DATA STEP • Display user friendly messages to help diagnose warnings or error messages • Provide flexibility to use all SAS procedure options
Summary: What the Edit Check MacrosWere Designed to do • Enable quick startup with minimum skills so that any programmer could assist as needed • Improve productivity of standard checks by 80% so that more time can be spent on complex checks • As data is downloaded, reduce confusion within the team early since data issues are known/accounted for • Improve the quality and reliability of source clinical data so that final data can be accepted at database lock time