Titanic - Machine Learning from Disaster
The project analyzes the Titanic dataset using advanced machine learning methods to predict passenger survival likelihood based on various factors, including demographics, ticket class, and family relationships. It leverages labeled data for model training and evaluation to construct a reliable machine-learning model capable of accurately determining passenger survival in the Titanic disaster. The datasets are sourced from Kaggle.
Through exploratory data analysis (EDA), valuable insights regarding demographics, survival rates, and variable correlations were uncovered. The data preprocessing steps included handling missing values, encoding categorical variables, and feature engineering to create new variables for deeper insights. Various machine learning models, including Random Forest and Decision Tree classifiers, were trained and compared.
Cross-validation demonstrated that Random Forest had superior accuracy and stability. Hyperparameter tuning further optimized the Random Forest model’s performance. The evaluation of the test data revealed a robust predictive model with an accuracy of 85%, a precision of 88%, and an F1-score of 78%. While recall could be improved, the overall results demonstrate the potential of advanced machine-learning techniques in historical data analysis. The insights gained into survival patterns during the Titanic disaster are significant.