Data Analytics for South West Area Health Pathology

Data Analytics for South West Area Health Pathology Dr. David Davies Jason Chen Prof. Jon Patrick Ritika Sharma Yu Xia

Agenda • Motivation • Research questions • Appendix • Kidney • Vasculitis • Move SWAPS from Mysql to Postgres • Postgres and Mysql Performance Comparision • Conclusion • CLINIDAL Comparison • Future Work

Motivation • Background • SWAPS is warehoused database of South West Area Health Pathology’s Anatomical Pathology Department from 1991 to 2005. • CLINIDAL is installed and is ready to answer research questions. • CLINIDAL has certain limitations. • Aim • of this project is to answer research question using SQL. • Keep record of the results so that CLINIDAL interface could be modified and results could be compared.

Research Questions • Appendix • Kidney Glomerular Disease • Vasculitis

Diagnosis Data Complexities • Appendix • Snomed RT Code based search • Kidney Glomerular Disease • Snomed RT Code & text based search • Vasculitis • Pure text based search

Appendix Objective • To establish consistency of the pathology of appendicitis with the surgically identified clinical findings. • Find prevalence for aggregations like agegroup, year and sex.

Appendix • Appendix is a common specimen both alone and included with other specimens. • In the database • specimen type designates primary organ • diagnosis code picks up incidental appendix • Required data set

Observations

Diagnosis Complexity • Diagnosis Data set sub divided into • Inflamed • Normal • Other • Multiple Snomed RT codes gave overlapping results

Appendix category distribution

Group by Year

Appendix group by Sex

Appendix Summary • Wrong coding, Normal cases coded as Inflamed or vice versa • Snomed RT codes were not grouped, inflamed in other organ and normal for appendix will put in Inflamed category. • Many other category cases could fit into normal cases e.g. Faecolith, Pinworm etc.

Kidney Glomerular Disease • Objective • To identify distribution of glomerular disease among its different sub types. • Find association of glomerular disease with Transplant and Tubulointerstitial disease

Kidney Glomerular Disease • Glomerular disease is the main reason why kidney/renal biopsy is done. • There are three stages of renal biopsy: • Histology • Immunoflorescence • Electron microscopy • Some diagnosis require all three to get a final diagnosis • Histology is fast, Electron microscopy takes time. • In many reports after histology diagnosis may remain as morphology only or list possibilities. • In this study we are considering histology reports only.

Diagnosis Data set • Data set was extracted three ways: • All cases with specimen type Renal Biop(sy) (86%) • All cases where specimen type was not Renal Biop(sy) but in the text there was ‘Renal Biopsy’ or ‘Renal Bx’ (13%) • Cases which don’t fall under above two conditions and have some common bigrams in the text report (1%).

Diagnosis Complexity • Glomerular disease is of many types, we divided data in 21 different groups. • It can be described two ways using Snomed RT codes: • Some cases we could divide into specific groups but then there were many cases which were falling in non specific Glomerulonephritis and remainder group. • For these cases text reports were read and further distributed in different groups from 1-20.

Glomerular Sub Group Distribution

Glomerular group distribution

Kidney Summary • As this was text and code based search, we would have picked only those cases which were consistently reported well along with correct codes. • 14% samples were not entered under specimen type renal biopsy. • We did not consider negation in text search. • Many diagnosis require clinical history to interpret eg. No abnormality seen is “normal”, if protein in urine then its consistent with “minimal change disease”

Vasculitis • Objective • Identify the prevalence of Vasculitis. What organs do they turn up in? • Distribution of Vasculitis among its subtypes.

Vasculitis • General term for a group of uncommon diseases that feature inflammation of the blood vessels. • There are no specific diagnosis codes for this disease. • The data set for this study will include all cases in database and its 100% text based search. • NLP comes into action here as we need to be able to handle negation.

Sub Group Distribution

Most common Specimen types

Vasculitis Summary • As this was pure text based search, handling negation was a big challenge. • While doing text search we found that in the report text, when ever sentence was rolling over to next line there was extra new line character, which was limiting our results when the phrase we were looking for were in two line. • Some of the group names had common name in them so we have duplicate records, nearly 20% • We started with 3466 unique cases and reduced them down to 1536 unique cases using few common negative terms, that was reduction of 55%. • We believe this number will go further down once we compute this using CLINIDAL.

Move SWAPS from mysql to postgresql

Comparing two systems • For every query, execute it five times and record the time. • Calculate the average time and standard deviation. • Put the data into a spreadsheet and draw a plot of the times and see what the ratio is between the two systems.

Table of research Q2

Logarithmic Chart for Q2 B: Time in Mysql C: Time in Postgres

Table of research Q3

Chart for Q3 B: Time in Mysql C: Time in Postgres

Conclusion • The time in Mysql is longer than the time in Postgres in most queries. • For the queries which include more than two tables, it will take a very long time to execute in Mysql. • But for the queries which only include one table, Mysql is usually faster than Postgres. • But most queries include more than one table, so usually Postgres is more efficient than Mysql.

CLINIDAL and SQL • We have finished Appendix question comparison of CLINIDAL and SQL results. • For both data retrieval methods we are getting the same results. • Kidney and Vasculitis are proving to be complex questions and we are doing CLINIDAL interface modifications at this stage.

Future Work • SQL has limited capability to handle text based search. • Comparison of CLINIDAL results with SQL findings for text based research questions will give us better idea, and that is in process at the moment.

Thank You

Data Analytics for South West Area Health Pathology

Data Analytics for South West Area Health Pathology

Presentation Transcript

Sea Cadet Corps South West Area Conference

Algorithms for Data Analytics

South West

Algorithms for Data Analytics

South West

Mental Health Service Area – South West

Data Analytics for Big Data

Health Projects in the South West

SOUTH WEST

South West 4 Primary School Planning Area

South West 3 Primary School Planning Area

South West Study Area Introduction

South West Occupational Health Group 2012

VASTU FOR SOUTH-WEST PROPERTIES

Golang for data analytics

Data Analytics for Big Data

Data Analytics Using R | Introduction To Data Analytics | Data Analytics For Beginners | Simplilearn

SOP for data analytics

DOWNLOAD/PDF Statistics & Data Analytics for Health Data Management