180 likes | 304 Vues
This project aims to predict survival outcomes from the Titanic disaster using decision trees, focusing on the hypothesis that "women and children first" played a significant role. Data from the Titanic, loaded into tools like Excel or R, presented challenges such as missing age values. Our analysis revealed key statistics: 74% of women survived compared to 19% of men, leading to a prediction accuracy of 76.5%. The decision tree model effectively identifies independent variables (age, gender) impacting survival, offering insights into passenger demographics and contributing factors.
E N D
Goal Predict whom survived the Titanic Disaster Hypotheses Woman and Children First Get Data Read dataset into Excel, R, etc Data Management Some Age Missing Data, Analyze Gender Only Statistics & Analysis 74% Women, 19% Men Submit Predictions 320 / 418 = 76.5%
Age All N = 891 Data N = 714 Missing N = 177
Decision Trees • Dependent variable, (Y) • Continuous • Categorical • Independent variables, (X’s) • Continuous • Categorical The Decision Tree looks for split on sample at the node that can lead to the most differentiation on Y
Decision Trees • maximize data likelihood (minimize deviance).
Prediction and Missing Values Correlation, Association of Age with other Variables?
Goal Predict whom survived the Titanic Disaster Hypotheses Woman and Children First Get Data Read dataset into Excel, R, etc Data Management Some Age Missing Data, Analyze Gender Only Statistics & Analysis 74% Women, 19% Men Submit Predictions 320 / 418 = 76.5%
Gender and Age • Tree grows based on optimizing only the split from the current node rather then optimizing the entire tree • Tree stops when further split becomes ineffective
Goal Predict whom survived the Titanic Disaster Hypotheses Woman and Children First Get Data Read dataset into Excel, R, etc Data Management Some Age Missing Data, Analyze Gender Only Statistics & Analysis Submit Predictions
Goal Predict whom survived the Titanic Disaster Hypotheses Woman and Children First Get Data Read dataset into Excel, R, etc Data Management Age + Gender Statistics & Analysis Submit Predictions
Decision Trees • Popular Implementations • CART Classification And Regression Tree • CHAID CHi-squared Automatic Interaction Detector • CHAID allows multiple branch split - a wider tree • CART uses binary split