Want to keep learning?

This content is taken from the Purdue University & The Center for Science of Information's online course, Introduction to R for Data Science. Join the course to learn more.

Skip to 0 minutes and 12 seconds This is not the question we might wanna ask, on which day of the year are the expected arrival delays the worst okay, or the best. It might help us to answer this question more succinctly, if we had a vector that told us exactly what day of the year we were on. If we go look at our data here, we’ve got the year and the month and the day of the month, and we could go put those together. For instance, I could go take The Month and the Day Of the Month, and the Year, that’s a little silly cuz they’re all 2008, but no problem. And I could paste those together.

Skip to 0 minutes and 52 seconds I don’t wanna see all 7 million of those, so I’ll just go look at the head. And that’s when I get the first six flights all occurred on January 3rd, 2008. Now, inside the paste, there’s something called the separator. The separator often we just want to glue things together with no space at all. The default glue is one single space. But here it might make sense to have a forward slash as our glue between the data. So we’ll put the month and then a slash, the day of the month and then a slash and then the year. Okay, so now that we’ve done that we could call this for instance MyDates. Or you can call it anything you want.

Skip to 1 minute and 32 seconds But I’m gonna take that information and save it into a vector called MyDates. Now again, with the interest of making sure that we’ve got the same length for each of the vectors we wanna work on, I can go take a look at the arrival delays and the length of those altogether is 7 million and for MyDates, length should also be the same 7 million there. So if I go break up the arrival delays according to the date. Let’s say one more time that we take the mean as our function here and we remove the NAs, I find out on all 365 days of the year, what’s the expected arrival delay. And I can sort the results.

Skip to 2 minutes and 18 seconds You can see that on December 19th for instance, there was a 42 minute expected arrival delay. You can look through the data, it’s a manageable number, there’s only 365 pieces of data there, maybe 366 since it was a leap year. For instance, on November 27, the expected arrival delay was six and half minutes early. Was this a leap year? Let’s just check. Indeed 366 days there. I will leave that so that people can see what I did. This is a leap year so we get 366 values for the result. Just double checking that our vectors we are working with have the same length. Lets go sort the results there.

Skip to 3 minutes and 19 seconds Here are all 366 days in 2008. Sort of, according to the expected arrival delay on that day. And again if you wanna go in and limit things to a certain airport, I can go say, let’s only look at the airports for which the destination was Indy. But I’ve got to make sure I put that same restriction onto the way I’m breaking my data up as well. Otherwise, they won’t have the same length. Okay, you got to make sure the data you’re working on has the same length as the data you’re using to break the data up into pieces. So if I now go and take a look at that.

Skip to 4 minutes and 1 second This is the expected arrival delay for each date, but only for the flights that are arriving to Indianapolis.

Arrival Delays by Day of the Year

Share this video:

This video is from the free online course:

Introduction to R for Data Science

Purdue University