Submit Predictions
This project aims to predict survival from the Titanic disaster by examining data related to age, gender, and the hypothesis of "Women and Children First." The dataset can be imported into tools like Excel and R for analysis, managing missing data, especially concerning age. Key statistics show that 74% of women and 19% of men survived. We will employ decision trees as a predictive model, examining how gender and age correlate with survival outcomes. The analysis will help improve predictive accuracy and understanding of determining factors in survival.
Submit Predictions
E N D
Presentation Transcript
Goal Predict whom survived the Titanic Disaster Hypotheses Woman and Children First Get Data Read dataset into Excel, R, etc Data Management Some Age Missing Data, Analyze Gender Only Statistics & Analysis 74% Women, 19% Men Submit Predictions 320 / 418 = 76.5%
Age All N = 891 Data N = 714 Missing N = 177
Decision Trees • Dependent variable, (Y) • Continuous • Categorical • Independent variables, (X’s) • Continuous • Categorical The Decision Tree looks for split on sample at the node that can lead to the most differentiation on Y
Decision Trees • maximize data likelihood (minimize deviance).
Prediction and Missing Values Correlation, Association of Age with other Variables?
Goal Predict whom survived the Titanic Disaster Hypotheses Woman and Children First Get Data Read dataset into Excel, R, etc Data Management Some Age Missing Data, Analyze Gender Only Statistics & Analysis 74% Women, 19% Men Submit Predictions 320 / 418 = 76.5%
Gender and Age • Tree grows based on optimizing only the split from the current node rather then optimizing the entire tree • Tree stops when further split becomes ineffective
Goal Predict whom survived the Titanic Disaster Hypotheses Woman and Children First Get Data Read dataset into Excel, R, etc Data Management Some Age Missing Data, Analyze Gender Only Statistics & Analysis Submit Predictions
Goal Predict whom survived the Titanic Disaster Hypotheses Woman and Children First Get Data Read dataset into Excel, R, etc Data Management Age + Gender Statistics & Analysis Submit Predictions