Supervised Learning With Kaggle Titanic Dataset

"I'm The King of The World! ... Whoops, Nevermind."

Background

Kaggle.com offers an introduction to supervised learning with the Titanic Dataset. The purpose of the competition is to build an accurate classification model that predicts whether a passenger would survive the Titanic crash. This is a helpful exercise to reinforce the fundamentals of Machine Learning.

There are plenty of resources available to assist in filling in the gaps and deepen understanding of the fundamentals. I had no experience in programming or advanced mathematics before starting the Data Science Bootcamp so I chose to stay focused on the basics. This competition was the most sensible for my needs.

Process

I created a Jupyter Notebook that is split into two distinct parts. The first is an overview of fundamental and important concepts of machine learning, and the second is the application of those concepts on the Titanic dataset.

You can read the original blog post on the New York Data Science Academy Blog here.

You can find the scripts & further details about the project on Github here.

Over 5 Algorithms Analyzed



Documentation

Created a notebook that outlines the fundamental and important concepts of machine learning and then applied those concepts to the dataset.



Modeling

Applied Feature Engineering, Feature Selection, & Model Evaluation techniques.



Insights Gleaned

The features that had a significant impact on survival rate were age, fare, and sex. The gradient boost model achieved the best results on my test dataset and received the best score on my submissions to Kaggle.

Interested In Discussing Further?

Get In Touch