Integrated Clinical Decision Scheme for SPC Prediction in Endometrial Cancer

Developing an Integrated Clinical Decision-making Scheme (ICDS) for Predicting SPCs in Women with Endometrial Cancer: A Retrospective Analysis in Taiwan and Thailand Prof. Chi-Chang Chang Chung-Shan Medical University, Taiwan Dr. Gin-Den ChenChung-Shan Medical University Hospital, Taiwan Dr. Wen-Chien TingChung-Shan Medical University Hospital, Taiwan This study was a joint work with Dr. Chalong Cheewakriangkrai Chiang Mai University, Thailand Prof. Ssu-Han Chen Ming Chi University of Technology, Taiwan Prof. Chi-Jie Lu Chien Hsin University of Science and Technology, Taiwan

This work is supported by the Taiwan Ministry of Science and Technology: 106-2633-E-040-001-(2-yrs, International Cooperation Project) Non-Conflict of Interest Statement.

Primary Messages • Understand the increasing burden due to cancer poses a threat to human development. To assess the trends of Second Primary Cancers (SPCs) in Taiwan. To propose an Integrated Clinical Decision-making Scheme (ICDS) for Predicting the Risk Factors of SPCs. • To demonstrate predicting SPCs in women with Endometrial Cancer in Taiwan and Thailand.

Cancer burden in the WORLD

There were 17.2 million cancer cases and 8.9 million deaths in 2016 worldwide worldwide

breast cancer (1.7 million cases) prostate cancer (1.4 million cases) There were 17.2 million cancer cases and 8.9 million deaths in 2016 worldwide worldwide

breast cancer (1.7 million cases) prostate cancer (1.4 million cases) There were 17.2 million cancer cases and 8.9 million deaths in 2016 worldwide worldwide Cancer caused 213.2 million DALYs globally for both sexes combined DALYs: disability-adjusted life-years

Cancer cases increased by 28% between 2006 and 2016 breast cancer (1.7 million cases) prostate cancer (1.4 million cases) There were 17.2 million cancer cases and 8.9 million deaths in 2016 worldwide worldwide Cancer caused 213.2 million DALYs globally for both sexes combined DALYs: disability-adjusted life-years

Cancer cases increased by 28% between 2006 and 2016 breast cancer (1.7 million cases) prostate cancer (1.4 million cases) There were 17.2 million cancer cases and 8.9 million deaths in 2016 worldwide worldwide Cancer caused 213.2 million DALYs globally for both sexes combined DALYs: disability-adjusted life-years The increasing burden due to cancer poses a threat to human development, …as well as the World Health Organization (WHO) Global Action Plan on Non-Communicable Diseases. To determine if these commitments have resulted in improved cancer control, quantitative assessments of the cancer burden are required. IMPORTANCE

Cancer burden in ASIA

Asia is the most diverse and populous continent; 4.3 billion of the world’s 7.1 billion people live here, and • The population will increase by 1 billion by 2050.

Asia is the most diverse and populous continent; 4.3 billion of the world’s 7.1 billion people live here, and • The population will increase by 1 billion by 2050. • According to the WHO, Asia accounts for 60%of the world population and halfthe global burden of cancer. • The incidence of cancer cases is estimated to increase from 7.5 million in 2008 to 10.6 million in 2030.

Cancer patterns and burden in Taiwan

Cancer patterns and burden in Taiwan • The high effectiveness of cancer screening and therapies resulted in the increased diagnosis of SPCs in Taiwan. • In Taiwan, among all cancer survivors, the 5-year survival rate is ~55.77%and • By 2020, an estimation of 99,491 new cases with cancer will be diagnosed. The 5-year relative survival rate with stages (selected) (2011-2015) Data Source: Taiwan Cancer Registry

Cancer survivors: living longer, and now, better? +1.41M

New paradigm: Comprehensive management of cancer Beyond prevention, diagnosis and treatment… Taking care of survivors

Figure 1. The cancer treatment trajectory with special identification of the post-treatment survivorship care phase.

JOURNAL PUBLICATIONS • Wen-Chien Ting, Yen-Chiao (Angel) Lu, Chi-Jie Lu, Chalong Cheewakriangkrai, Chi-Chang Chang * (2018). Recurrence Impact of Primary Site and Pathologic Stage in Patients Diagnosed with Colorectal Cancer. Journal of Quality, Vol. 25, No. 3, pp. 166-184. • Chih-Jen Tseng, Chi-Chang Chang, Chi-Jie Lu, Chalong Cheewakriangkrai (2017, Jul). Integration of ensemble learning and data mining techniques to predict risk factors for recurrent ovarian cancer. Artificial Intelligence in Medicine, 78 (2017) 47-54. • Chien-Sheng Cheng, Pei-Wei Shueng, Chi-Chang Chang*, Chi-Wen Kuo (2018) Adapting an Evidence-based Diagnostic Model for Predicting Recurrence Risk Factors of Oral Cancer, Journal of Universal Computer Science, vol. 24, no. 6, pp. 742-752. • Wen-Chien Ting, Yen-Chiao (Angel) Lu, Chi-Jie Lu, Chalong Cheewakriangkrai, Chi-Chang Chang * (2018). Recurrence Impact of Primary Site and Pathologic Stage in Patients Diagnosed with Colorectal Cancer. Journal of Quality, Vol. 25, No. 3, pp. 166-184 • Chih-Jen Tseng, Chi-Chang Chang*, Chi-Jie Lu (2017, Jul). Integration of ensemble learning and data mining techniques to predict risk factors for recurrent ovarian cancer. Artificial Intelligence in Medicine, 78 (2017) 47-54. • Chang Chi-Chang, Wen-Chien Ting, Ting Teng, Che-Hsin Hsu (2014) Evaluating the Accuracy of ensemble learning approaches for Prediction on Recurrent Colorectal Cancer, International Journal of Engineering and Innovative Technology, Vol. 3, Issue 10, pp. 19-22 • Chang Chi-Chang (2014) Bayesian Decision Analysis for Recurrent Cervical Cancer, Open Journal of Clinical Diagnostics, Vol. 4, No. 2, pp. 71-76 • Chih-Kuang Chang, Chi-Chang Chang* (2014) Bayesian Imperfect Information Analysis for Clinical Recurrent Data, Therapeutics and Clinical Risk Management, Vol. 2015:11, pp. 17-26. • Chang Chi-Chang, Chih-Jen Tseng, Ting-Huan Chang*, Chiu-Hsiang Lee (2014, Nov). Bayesian Decision Analysis for Recurrent Ovarian Cancer. Wseas Transactions on Systems and Control, Vol. 9, Art. #56, pp. 540-546. • Chi-Chang Chang*, Chi-Jie Lu, Sun-Long Cheng and Kuo-Hsiung Liao (2013) Prediction of Recurrence in Patients with Cervical Cancer using MARS and Classification, International Journal of Machine Learning and Computing, Vol. 3, No. 1, pp. 75-78, February 2013. • Chih-Jen Tseng, Chi-Jie Lu, Chi-Chang Chang*, Gin-Den Chen (2013) Application of Machine Learning to Predict the Recurrence-Proneness for Cervical Cancer, Neural Computing and Applications, vol. 24, no. 6, pp. 1311-1316.

Second Primary Cancers (SPCs) • SPCs can reflect the result of early detection, supportive care, and advanced radiological and chemical treatments. • Previous studies indicated that the prevalence of Second Primary Cancers (SPCs) ranged between 0.73% and 11.7%. • Nevertheless, the clinical correlation of SPCs has not yet been clarified in Taiwan. From Travis LB. Acta Oncologica 2002;41:323-333.

Taiwan Cancer Registry (TCR) • Cancer registration provides core information for cancer surveillance and control. • The population based Taiwan Cancer Registry was implemented in 1979. • After the Cancer Control Act was promulgated in 2003, the completeness (97%) and data quality of cancer registry database has achieved at an excellent level. • The Taiwan Cancer Registry has run smoothly for >30 years, which provides essential foundation for academic research and cancer control policy in Taiwan. TCR

A Nationwide Retrospective Analysis of Second Primary Cancers in Taiwan: 1996-2010 • From 1996 to 2010, a total of 994,734 patients was screened and obtained retrospectively from the Taiwan Cancer Registry. • We quantified the clinical characteristics and the most common cancer pairs of SPCs using statistical and epidemiological indicators. 2010 1996

Summary information

Summary information No. Cases Both Male Female Year

The Frequent Types of Secondary Cancer

The Distribution of Cancer Pairs (Female) 1 2 3

Synchronous versus Metachronous(both sexes)

Due to early detection, effective therapies and appropriate intervention the probability of SPCs in the same patient has increased. • Indeed, the cancer registries can help us understand the disease better and use our resources to the best effect in the prevention and treatment of SPCs. • Based on these findings, further analysis the risk factors of the relationship between them is worthwhile.

Preliminary Study • We quantify the clinical characteristics, develop a predictive model and identify related risk factors for the SPC of patients with endometrial cancer. • The dataset is difficult to predict because the data in different classes are mixed together and there are class imbalance problem. • The proposed an Integrated Clinical Decision-making Scheme (ICDS) which introduces different strategies for increasing the prediction performance and then auto-select a strategy combination to improve classification performance using a Taguchi Design of Experiment (DOE).

Ensemble Learning Integrated Clinical Decision-making Design • classification and regression tree (CART) • eXtreme gradient boosting (XGBOOST) Balanced Metric Features Extraction Basic Classification Flowchart • accuracy • area under curve (AUC) • training a classifier • exploring risk factors • correspondence analysis (CA) • original features Clustering Resampling • k-means or expectation maximization (EM) • without doing clustering Cross-validation Taguchi Method • upsampling • without doing resampling • searching hyper-parameters • k-fold cross-validation • selecting strategy combination • reducing experiment times

Basic Flowchart of Classification • In the proposed model, we first divide the originally dataset into training data and testing data with a specific percentage.

Strategies for Improving Balanced Accuracy • Transformation: to transform original feature space into other lower dimensional space. • Yu, Chum and Sim (2014); Nasution, Sitompul and Ramli (2018) • Resampling: to balance the number of cases in each class. • Jishan, Rashu, Haque and Rahman (2015); Yan, Qian, Guan and Zheng (2016); Bennin, Keung, Phannachitta, Monden and Mensah (2017) • Clustering: to group similar cases in advance. • Kyriakopoulou and Kalamboukis (2008); Yong, Youwen and Shixiong (2009); Trivedi, Pardos and Heffernan (2015); Alapati and Sindhu (2016) • Ensemble Learning: to stack different kinds of classifiers. • Ozcift and Gulten (2011); Yerima, Sezer and Muttik (2015); Abouelnaga, Ali, Rady and Moustafa (2016)

The Integrated Clinical Decision-making Scheme (ICDS) Choose a set of best hyper-parameters Random Search or Cross-validation k-folds Cross Validation … Hyper- parameters1 Avg metric1 … … … Model1 metric11 metric12 metric1k x1 x2 … xp y Feature Extraction & Data Preprocess Hyper- parameters2 Select a Predicting Model … Training Data … Model2 Avg metric2 … … metric21 metric22 metric2k … … … … … … Hyper- parametersm … Avg metricm Database … … … Modelm Hyper- parameters* metricm1 metricm2 metricmk x1 x2 … xp y Best Model Feature Extraction & Data Preprocess x Testing Data y 1. Confusion Matrix 2. Variable Important 3. Tree or Rules ŷ

Choose a set of best hyper-parameters PCA Transformation is introduced k-folds Cross Validation Random Search … eta1 max_depth1 gamma1 … Avg BalAcc1 … … … Model1 BalAcc11 BalAcc12 BalAcc1k PC1 PC2 … PCly Feature Extraction & Principle Component Analysis … eta2 max_depth2 gamma2 … XGBOOST Training Data … Model2 Avg BalAcc2 … … BalAcc21 BalAcc22 BalAcc2k … … … … … … … etammax_depthmgammam… Avg BalAccm Database … … … Modelm eta* max_depth* gamma*… BalAccm1 BalAccm2 BalAccmk PC1 PC2 … PCly Feature Extraction & Principle Component Analysis Best Model PC Testing Data y ŷ 1. Confusion Matrix 2. Variable Important

Choose a set of best hyper-parameters Oversampling is introduced k-folds Cross Validation Random Search … eta1 max_depth1 gamma1 … Avg BalAcc1 … … … Model1 x1 x2 … xp y BalAcc11 BalAcc12 BalAcc1k Feature Extraction & Resampling … eta2 max_depth2 gamma2 … XGBOOST Training Data … Model2 Avg BalAcc2 … … BalAcc21 BalAcc22 BalAcc2k … … … … … … … etammax_depthmgammam… Avg BalAccm Database … … … Modelm eta* max_depth* gamma*… BalAccm1 BalAccm2 BalAccmk x1 x2 … xp y Best Model x Feature Extraction Testing Data y ŷ 1. Confusion Matrix 2. Variable Important

Choose a set of best hyper-parameters Clustering is introduced k-folds Cross Validation Random Search … eta1 max_depth1 gamma1 … Avg BalAcc1 … … … Model1 BalAcc11 BalAcc12 BalAcc1k x1 x2 … xpc1 c2 … y Feature Extraction & Clustering … eta2 max_depth2 gamma2 … XGBOOST Training Data … Model2 Avg BalAcc2 … … BalAcc21 BalAcc22 BalAcc2k … … … … … … … etammax_depthmgammam… Avg BalAccm Database … … … Modelm eta* max_depth* gamma*… BalAccm1 BalAccm2 BalAccmk x1 x2 … xpc1 c2 … y Best Model x Feature Extraction & Allocation Testing Data y ŷ 1. Confusion Matrix 2. Variable Important

Choose a set of best hyper-parameters Using XGBOOST as the Base Learner k-folds Cross Validation Random Search … eta1 max_depth1 gamma1 … Avg BalAcc1 … … … Model1 • The base learner we considered is using eXtreme Gradient Boosting (XGBOOST). • XGBOOST is regarded as the master piece for Kaggle. In 2015, 17 out of 29 Kaggle Champion teams used this classifier to win their titles. • Many categorical or ordered variables in our dataset. • No data distribution assumption is needed. • Tree-based methods often perform well on imbalanced datasets because their hierarchical structure allows them to learn signals from classes. • Package xgboost in Python; Package caret in R. BalAcc11 BalAcc12 BalAcc1k x1 x2 … xp y Feature Extraction & Data Preprocess … eta2 max_depth2 gamma2 … XGBOOST Training Data … Model2 Avg BalAcc2 … … BalAcc21 BalAcc22 BalAcc2k … … … … … … … etammax_depthmgammam… Avg BalAccm Database … … … Modelm eta* max_depth* gamma*… BalAccm1 BalAccm2 BalAccmk x1 x2 … xp y Best Model Feature Extraction & Data Preprocess x Testing Data y ŷ 1. Confusion Matrix 2. Variable Important 3. Tree or Rules

Choose a set of best hyper-parameters Setting of XGBOOST k-folds Cross Validation Random Search … eta1 max_depth1 gamma1 … Avg BalAcc1 … … … Model1 BalAcc11 BalAcc12 BalAcc1k x1 x2 … xp y Feature Extraction & Data Preprocess … eta2 max_depth2 gamma2 … XGBOOST Training Data … Model2 Avg BalAcc2 … … BalAcc21 BalAcc22 BalAcc2k … … … … … … … etammax_depthmgammam… Avg BalAccm Database … … … Modelm eta* max_depth* gamma*… BalAccm1 BalAccm2 BalAccmk x1 x2 … xp y Best Model Feature Extraction & Data Preprocess x Testing Data y ŷ 1. Confusion Matrix 2. Variable Important 3. Tree or Rules

Choose a set of best hyper-parameters Random Search k-folds Cross Validation Stacking ensemble is introduced Avg BalAcc1 Avg BalAcc1 Avg BalAcc1 subclass1 eta1 max_depth1 gamma1 … size1 l2reg1 lambda1 … … … … … … … … … … … … … M1 M1 M1 BalAcc11 BalAcc11 BalAcc11 BalAcc12 BalAcc12 BalAcc12 BalAcc1k BalAcc1k BalAcc1k eta* max_depth* gamma*… Best Model1 Best Model2 Best Modelq subclass2 eta2 max_depth2 gamma2 … size2 l2reg2 lambda2 … Avg BalAcc2 Avg BalAcc2 Avg BalAcc2 … … … XGBOOST MLP MDA … … … … … … … … … M2 M2 M2 BalAcc21 BalAcc21 BalAcc21 … … … BalAcc22 BalAcc22 BalAcc22 BalAcc2k BalAcc2k BalAcc2k … … … … … … … … … … … … … … … Avg BalAccm Avg BalAccm Avg BalAccm subclassm sizem l2regmlambdam… etammax_depthmgammam… … … … … … … … … … … … … x1 x2 … xp y Mm Mm Mm Feature Extraction & Data Preprocess BalAccm1 BalAccm1 BalAccm1 BalAccm2 BalAccm2 BalAccm2 BalAccmk BalAccmk BalAccmk Training Data ŷ size* l2reg* lambda*… Confusion Matrix GLM … Database … x … … y subclass* x1 x2 … xp y Feature Extraction & Data Preprocess Testing Data

Choose a set of best hyper-parameters Balanced Metric is introduced k-folds Cross Validation Grid Search … eta1 max_depth1 gamma1 … Avg BalAcc1 … … … Model1 • To try to select a best model by a balanced metric or not? • Evaluation metric: accuracy, AUC BalAcc11 BalAcc12 BalAcc1k x1 x2 … xp y Feature Extraction & Data Preprocess … Tree-based Classifier eta2 max_depth2 gamma2 … Training Data … Model2 Avg BalAcc2 … … BalAcc21 BalAcc22 BalAcc2k … … … … … … … etammax_depthmgammam… Avg BalAccm Database … … … Modelm eta* max_depth* gamma*… BalAccm1 BalAccm2 BalAccmk x1 x2 … xp y Best Model Feature Extraction & Data Preprocess x Testing Data y ŷ 1. Confusion Matrix 2. Variable Important 3. Tree or Rules

Integrated Clinical Decision Scheme for SPC Prediction in Endometrial Cancer

Integrated Clinical Decision Scheme for SPC Prediction in Endometrial Cancer

Presentation Transcript

“This is a Test. This is Only a Test!”

Software Testing

3D Test Issues

Test and Test Equipment December 2012 Hsin -Chu , Taiwan

Who wants to be a Millionaire?

Test Preparation, Test Taking Strategies, and Test Anxiety

Test Automation Tools: QF-Test and Selenium

System Test Specification

TDC ( Test Description Code)

Engine Condition Diagnosis

Chi-square test or c 2 test

200

Test del Software, con elementi di Verifica e Validazione, Qualità del Prodotto Software

Test of Significance

System Test Tools

Lesson 7