Index ¦ Archives

Airport Delays Analysis using Clusters

Logo

Overview

In this project, we will use three different datasets related to airport operations. These include a dataset detailing the arrival and departure delays/diversions by airport, a dataset that provides metrics related to arrivals and departures for each airport, and a dataset that details names and characteristics for each airport code.

Executive Summary

The purpose of this report is to investigate flight delays for FAA regulated airports in the US.There are significant delays with certain airports.Over the years these delays have not improved or have worsened. The data provided include a dataset detailing the arrival and departure delays/diversions by airport, a dataset that provides metrics related to arrivals and departures for each airport , and a dataset that details names and characteristics for each airport code.

I used Principal Component Analysis and clustering to divide the airport/year pairs into two groups by an aggregated delay measure. 91% of the pairs fell into the first group with better delay metrics, and approximately 9% fell into the second group, which are assumed to have a worse delay metrics.

We have data for 74 airports, five of them fell into the worse group for all the years. These airports are: Chicago O’Hare Int’l Airport, Atlanta Hartsfield, Dallas/Fort Worth Intl, La Guardia, and Newark Liberty Intl. The FAA should focus on improving the performance on these ones first. In the other hand we have the following airports topping the list with the higher percentage of "on time" departures: Westchester county, Newark liberty intl, John f kennedy intl, and Philadelphia intl. The FAA should take a closer look on the factors that lead to these higher percentages and replicate some of that for the worse group.

Risks and Assumptions of our data

We only have 799 records in total. Only 74 airports are included in the dataset. We do not have other factors for delays like the ones related to weather which can be quite common, or mechanical problems with the airplanes...etc. Due to this our analysis will be based on the delay and diversion information present in the data set.

The following airports do not have information for all the years OXR-Oxnard, PSP-Palm Springs Intl, RFD- Chicago/Rockford Intl, and SMF-Sacramento Intl

Background

The FAA wants to cut down on delays nationwide, and wants to investigate the locations with the worst delays. They have provided datasets that have delay information for a 10 year range between 2004 and 2014.

Key statistics table.

genre heatmap

We can see below that departure cancellations is highly correlated to arrival cancellations as well as departure diversions to arrival diversions

genre heatmap

genre heatmap

This graphics below show the average od departure cancellation and departure diversion per year

genre heatmap

genre heatmap

Histograms

The dataset is somehow normal but skewed to the right

genre heatmap

proj07_siluette_scores

Methods

To analyze the airport delays I used Principal Component Analysis to reduce the delay features. There are 14 features in total.

Click here to see Feature correlation here

We can see in the PCA analysis that 85% of the variation in the original data could be captured by three principal components.

pc(s) and Variance Ratio

We also implemeted k-means clustering and plotted the siluetes scores to determine the optimal K or number of clutters. We can see below the 2 clusters give us the higher siluette score.

siluette scores

Hierarchical clustering algorithms was also used verified they are there are two clusters that you can see plotted in the PCA space in the dynamic graphic below:

PCA

Results & Recommendations

The analysis show that the busiest airports have the most delayed as mentioned in the executive summary above, they should focus on improving the performance on these ones first. The FAA should take a look at other factors like weather, mechanical problems with the airplanes, and staff operation.

They are at the top of the list of the 10 world's busiest airports by aircraft movements:

Busiest Airports

At the same time the FAA should take a closer look a thoses airports that are also busy but were able to have more on time departure rate and see their model to see if the same can me implemeted on the most delayed ones. See list below:

Max perc on time air departures

Share on: Diaspora*TwitterFacebookGoogle+Email

© 2016 Maria Pichardo. Built using Pelican. Theme by Giulio Fidente on github.