Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning Jian Zhang Supervised by: Karen Petrie

Background • Cancer research has become an extremely data rich environment. • Plenty of analysis packages can be used for analyzing the data. • Data preprocessing.

Rich data environment • There are some factors about breast cancer

Raw clinical data sample • Yes-No data: yes: yes, Yes, Ye, yed, yef … no: No, n, not … null: don’t know, no data, waiting for lab • Positive-Negative data: Positive: +, ++, p, p++… Negative: -, n, neg, n---… Null: no data, ruined sample, waiting for lab

Basic version

Question? Could we make the process automated?

Introduction • Decision Tree learning • Weka

Decision Tree Learning • Decision tree learning is a method for approximating discrete-valued functions, which is one of the most popular inductive algorithms.

Decision tree sample

Weka • Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, which contains a collection of algorithms for data analysis and predictive modeling.

Experiment • Data: Training dataset with 100 instances Test dataset with 100 instances, which has 17 different values from the training dataset • Tool: weka

Experiment • Experiment 1 : training dataset • Experiment 2 : training dataset, test dataset

Experiment 1

Experiment 2

Result • Through the results, the decision tree has a good classification and prediction for the existing entries, but for the unknown entries, the prediction is not as good as expected.

Future work • Find and correct the incorrect prediction in the process • Automated transformation for unknown entries

Thank you !

Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Presentation Transcript

Data Mining and Decision Tree

Mapping Techniques: Demonstration of automatic data transformation

Decision Tree Learning

Raw Data

Decision Tree Learning

Raw Data

Decision Tree Learning

Decision Tree Learning

Decision tree learning

Decision Tree Learning

Decision Tree Learning

RAW DATA

Decision Tree Learning

Decision Tree Learning

Data Access Decision Tree for Critical Infrastructure Data

Decision Tree Learning

Decision tree learning

Decision Tree Learning

Data Services Transforming Raw Data into Business Insights

Data Alchemy Transforming Raw Data into Actionable Insights