1 / 22

Data Mining of Blood Handling Incident Databases

Data Mining of Blood Handling Incident Databases. Costas Tsatsoulis Information and Telecommunication Technology Center Dept. of Electrical Engineering and Computer Science University of Kansas tsatsoul@ittc.ku.edu. Background. Incident reports collected for handling of blood products

dessa
Télécharger la présentation

Data Mining of Blood Handling Incident Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining of Blood Handling Incident Databases Costas Tsatsoulis Information and Telecommunication Technology Center Dept. of Electrical Engineering and Computer Science University of Kansas tsatsoul@ittc.ku.edu

  2. Background • Incident reports collected for handling of blood products • An initial database was collected to allow experimentation • Goals: • Allow the generation of intelligence from data • Unique events • Event clusters • Event trends • Frequencies • Simplify the job of the QA • Similar reports • Less need for in-depth causal analysis • Allow cross-institutional analysis

  3. Annual Accidental Deaths in U.S.A.

  4. Institute of Medicine Recommendation November 1999 • Establish a national focus of research , to enhance knowledge base about patient safety • Identify and learn from errors through both mandatory and voluntary reporting systems • Raising standards and expectations through oversight organizations • Create safety systems through implementation of safe practices at the delivery level

  5. Near Miss Event Reporting • Useful data base to study system’s failure points • Many more near misses than actual bad events • Source of data to study human recovery • Dynamic means of understanding system operations

  6. The Iceberg Model of Near-Miss Events • 1/2,000,000 fatalities • 1/38,000 ABO incompatible txns • 1/14,000 incorrect units transfused 1/2,000,000 1/38,000 1/14,000 Near-Miss Events

  7. Intelligent Systems • Developed two separate systems: • Case-Based Reasoning (CBR) • Information Retrieval (IR) • Goal was to address most of the needs of the users: • Allow the generation of intelligence from data • Unique events • Event clusters • Event trends • Frequencies • Simplify the job of the QA • Similar reports • Less need for in-depth causal analysis • Allow cross-institutional analysis

  8. Case-Based Reasoning • Technique from Artificial Intelligence that solves problems based on previous experiences • Of significance to us: • CBR must identify a similar situation/problem to know what to do and how to solve the problem • Use CBR’s concept of “similarity” to identify: • similar reports • report clusters • frequencies

  9. What is a Case and how do we represent it? • An incident report is a “case” • Cases are represented by: • indexes • descriptive features of a situation • surface or in-depth or both • their values • symbolic “Technician” • numerical “103 rpm” • sets “{Monday, Tuesday, Wednesday}” • other (text, images, …) • weights • indicate the descriptive significance of the index

  10. Finding Similarity • Define degrees of matching between attributes of an event report. For example: • “Resident” and “MD” are similar • “MLT,” “MT,” and “QA/QC” are similar • A value may match perfectly or partially • “MLT” to “MLT” • “MLT” to “MT” • Different attributes of the event report are weighted • The sum of the matching attributes with their degree of match and their weights, defines similarity • Cases matching over some predefined degree of similarity are retrieved and considered similar

  11. Information Retrieval • Index, search and recall text without any domain information • Preprocess document • remove stop words • stemming • Use some representation for documents • vector-space model • vector of terms with their weight = tf * idf • tf = term frequency = (freq of word)/(freq of most frequent word) • idf = inverse document frequency = log10((total docs)/(docs with term)) • Use some similarity metric between documents • vector algebra to find the cosine of angle between vectors

  12. CBR for • From the incident report features selected a subset as indexes • Semantic similarity defined • (OR, ER, ICU, L&D) • (12-4am, 4-8am), (8am-12pm, 12-4pm), (4-8pm, 8pm-12am) • Domain-specific details defined • Weights assigned • fixed • conditional • weight of some causal codes based on whether they were established using a rough or in-depth analysis

  13. IR for • No deletion of stop words • “or” vs. “OR” • No stemming • Use the vector space model and the cosine comparison measure

  14. Experiments • Database of approx. 600 cases • Selected 24 reports to match against case base • CBR retrieval - CBR_match_value EXPERIMENT 1 • IR retrieval - IR_match_value EXPERIMENT 2 • Combined retrieval EXPERIMENTS 3-11 • WCBR*CBR_match_value +WIR*IR_match_value • weights range from 0.9 to 0.1 in increments of 0.1 • (0.9,0.1), (0.8,0.2), (0.7,0.3), …, (0.2,0.8),(0.1,0.9) • CBR retrieval with all weights set to 1 EXPERIMENT 12 • No retrieval threshold set

  15. Evaluation • Collected top 5 cases for each report for each experiment • Because of duplication, each report had 10-20 cases retrieved for all 12 experiments • A random case was added to the set • Results sent to experts to evaluate • Almost Identical • Similar • Not Very Similar • Not Similar At All

  16. Preliminary Analysis • Determine agreement/disagreement with expert’s analysis • is a case similar? • is a case dissimilar? • Establish accuracy (recall is more difficult to measure) • False positives vs. false negatives • What is the influence of the IR component? • Are the weights appropriate? • What is the influence of varying selection thresholds?

  17. Results with 0.66 threshold

  18. Results with 0.70 threshold

  19. Combined Results Increasing selection threshold

  20. Some preliminary conclusions • The weights used in CBR seem to be appropriate and definitely improve retrieval • In CBR, increasing the acceptance threshold improves selection of retrievable cases but also increases the false positives • IR does an excellent job in identifying non-retrievable cases • Even a 10% inclusion of IR to CBR greatly helps in identifying non-retrievable cases

  21. Future work • Plot performance versus acceptance threshold • identify best case selection threshold • Integrate the analysis of the second expert • Examine how CBR and IR can be combined to exploit each one’s strengths: • CBR performs initial retrieval • IR eliminates bad cases retrieved • Look into temporal distribution of retrieved reports and adjust their matching accordingly • Examine a NLU system for incident reports that have longer textual descriptions • Re-run on different datasets • Get our hands on large datasets and perform other types of data mining (rule induction, predictive models, probability networks, supervised and unsupervised clustering, etc.)

More Related