Machine Learning Analysis of SNPs to Predict Radiation Toxicity in Prostate Cancer Patients

www.polyomx.org Analysis of Single Nucleotide Polymorphisms in Candidate Genes and Application of Machine Learning Techniques to Predict Radiation Toxicity in Prostate Cancer Patients Treated with Conformal Radiotherapy Wang Y1,2, Damaraju S1,3,4, Cass CE1,3,4, Murray D3,4, Fallone G3,4, Parliament M3,4 and Greiner R1,2 PolyomX Program1, Department of Computing Science2 and Oncology3, U of A and Cross Cancer Institute4 - - Study Design AIM: To explore the possible relationship between 51 single nucleotide polymorphisms (SNPs) in candidate genes encoding DNA damage, recognition/repair/response and clinical radiation toxicity in a retrospective cohort of patients (n=82) treated with conformal radiotherapy (3DCRT) for prostate cancer. In this study, we tested techniques from Machine Learning (ML) to build classifiers and to predict toxicity in patients' treated with radiation. SNPs (Single Nucleotide Polymorphisms) are commonly occurring genetic variations. SNPs may affect an individual's susceptibility to disease or response to particular treatment by altering the expression of the gene in which it occurs. Results Our initial analysis suggested 70-80% prediction accuracy by the following SNPs in this rank order: XRCC3 (A>G, 5’ UTR Nt 4541), CYP2D6*4 (G>A, Splicing defect), BRCA2 (A>G, K 1132 K), MLH1 (C>T, V 219 I), BRCA1 (A>G, R 356 Q), RAD51 (G>T, 5’ UTR Nt 172), BRCA2 (A>G, S 455 S), BRCA2 (C>A, N 289 H), and BRCA2 (A>G, D 991 N). The 4,000-trial permutation test demonstrated significance at the p<0.05 level for both J48 and KStar classifiers. Methods SNPs served as features (independent variables) and the patient response to treatment as the class label (dependent variable). Patients (n=28) with adverse reactions (rectal bleeding) to radiation more than 90 days after treatment were considered as negative and the remaining 54 as positives in a binary classification. We considered two types of classifiers: the "J48" decision tree and the "KStar" nearest-neighbor. For each classifier, we also used information gain to rank the quality of the SNPs and then considered classifiers based on the top k SNPs, for different "ks”. We used ten-fold cross validation to estimate the quality (predictive accuracy) of each classifier with each feature subset as a way to identify the best classification system. We ran a permutation test (using 4000 trials) to test the significance of our results . Decision Trees is a tree-structured decision diagram based on the training data. It can be used to classify new data. Information Gain is a concept coming from the information and decision tree theory. It defines the increase in information which is caused by adding a new attribute node to a rule or decision tree. Usually an attribute with high information gain should be preferred to other attributes. - - - • 0 if attribute “a” is NOT correlated with class “c” • Positive if correlated • K-fold Cross Validation is a common method used for model checking. ( Example: when K=3) Radiation toxicity: Patients treated with conformal radiotherapy (3DCRT) were given a RTOG toxicity score from 0 - 5. We assigned positive and negative labels for each patient based on toxicity scores such that a score of 2 or higher during the course of the treatment was considered negative or experiencing adverse reaction to radiation therapy, while others were given given a positive label. Machine Learning: The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience. [1] The techniques are designed to find patterns in training data and classify new data. Conclusion: Machine Learning techniques can be used for SNP data analyses and clinical treatment outcome prediction. This preliminary analysis demonstrates the utility of Machine Learning in discriminating between populations according to SNP data towards identifying predictive SNPs for use in radio-genomics in the near future. Acknowledgements KStar is a nearest neighbor method with a generalized distance function based on transformations. Permutation test: Randomly rearrange LABELS of data, and run through the same algorithm. This work was funded by the Research Initiatives Program of the Alberta Cancer Board. - - Reference [1] Mitchell, T. Machine Learning. McGraw-Hill, Boston, 1997.

Machine Learning Analysis of SNPs to Predict Radiation Toxicity in Prostate Cancer Patients

Machine Learning Analysis of SNPs to Predict Radiation Toxicity in Prostate Cancer Patients

Presentation Transcript

polyomx

Analysis of Molecular and Clinical Data at PolyomX

polyomx

polyomx

polyomx