Skip main navigation
We use cookies to give you a better experience, if that’s ok you can close this message and carry on browsing. For more info read our cookies policy.
We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.

Skip to 0 minutes and 12 secondsContinuing to think about these most popular airports, why don't we go look and see how often the flights are on time at these airports? If I go in and I take a look at the data here, what can I see? I've got departure delays, arrival delays, things like this. And the departure delays, for instance, are negative if the flight leaves on time or early. On time the departure delay would be zero. If the flight left just a little bit early, the departure delay would be negative. Why don't we go take a look at that.

Skip to 0 minutes and 43 secondsSo if I go look at the departure delays and find out which ones are negative, this is gonna give me a vector of trues and falses. Amongst the first six, the first three are false, the fourth one's true, the next couple are false.

Skip to 0 minutes and 59 secondsIf you don't wanna use head there, if you want more of those, well one thing you can do is you can say you wanna have 20 of them, for instance. Or equally well, you could say let's just ask for the first 20 things in that vector.

Skip to 1 minute and 16 secondsYour two ways to check the first 20 flights and see which ones departed on time or early.

Skip to 1 minute and 25 secondsThe ones in which the departure delay was negative or indeed zero would be on time you could make it less that or equal to zero. That's not gonna change things. I mean unless you had one that was exactly zero, which is a little bit unlikely. So let's go back and use our tapply command to go and see how many of the flights meet this criteria. Okay, I'm gonna go look at the departure delays that are less than or equal to zero. That's a vector of trues and falses. And I'm gonna break that data up according to the origin airport. And I'm just gonna sum up the results.

Skip to 2 minutes and 2 secondsSo this is gonna tell me for each of these origin airports how many of them departed on time, in other words, had departure delay less than or equal to zero. And I get a table of how many of them did. For instance, I can see at O'Hare there were 176,000 flights that departed on time or early.

Skip to 2 minutes and 24 secondsAnd I also know when I go look at all the origin airports how many flights there were altogether. Okay, so this tells us how many flights at each airport departed on time or early.

Skip to 2 minutes and 40 secondsAnd this is the total number of flights that departed from each airport.

Skip to 2 minutes and 49 secondsNow I could go into each of these and extract the information for the most popular airports. Okay, I could take this table of how many flights arrived on time or little bit early and look at the ones that are most popular. And I can use as an index into that vector the names of these ten most popular airports and just get the data for those. And similarly I can do that for the number of all of the flights, regardless of whether they were departing on time or not. Okay, so we can restrict attention to only the ten most popular airports. And each of these are counts.

Skip to 3 minutes and 29 secondsOf course, all the numbers in the first one are smaller than the second one. Because in this first result here, we've got all the flights that departed on time or early. And in the second one we've got all of the flights. And the neat thing about R, again is, everything is vectorized. So if I take the first result and divide by the second result, R knows to go element by element and divide.

Skip to 3 minutes and 50 secondsSo, one way I can do that is I can just take the first vector there and divide by the second vector, And again if you wanna run more than one line at a time, you can just highlight them and then command return on the Mac, control R on Windows. And there you get the analogous percentages. You can check and make sure it did the right thing, for instance for Atlanta here, there were 233,718 flights that departed on time or early and 414,513 flights all together. So for Atlanta it would be 56% of the flights. But I don't wanna have to go and do ten divisions like that, right. It's much easier to do things in a vectorized manner.

Skip to 4 minutes and 39 secondsOkay, so let's double check the result for Atlanta. That looks right.

Skip to 4 minutes and 46 secondsSo we divide each element in the first vector by the analogous element in the second vector.

Skip to 4 minutes and 55 secondsThis gives us basically ten divisions in a row.

Skip to 5 minutes and 2 secondsAnd as a result, we knew the percentage of the flights at each of the ten most popular airports that departed on time or early.

Analysis of On-Time and Early Departures

Share this video:

This video is from the free online course:

Introduction to R for Data Science

Purdue University

Contact FutureLearn for Support