Want to keep learning?

This content is taken from the Purdue University & The Center for Science of Information's online course, Introduction to R for Data Science. Join the course to learn more.

Skip to 0 minutes and 11 seconds Let’s go just a little bit further here into our study of these Indianapolis flights. We’ll go look at the head of our data frame we created with the 42,750 flights that have Indy as an origin. Let’s look at all of these variables we’ve got.

Skip to 0 minutes and 28 seconds Looks like there’s a departure time 6:28 in the morning, 9:26 in the morning, 18:29.

Skip to 0 minutes and 36 seconds That means 6:29 in the afternoon for Americans. 19:40 again 7:40 in the morning for Americans, you’ve got these departure times that are between 0 and 2400. So we could go look and find out when flights are departing out of Indianapolis. That would be one possibility. We could go look at my Indy origins and see what’s the departure time. You notice as I’m typing that departure time, I could just go click here. R is prompting me for which things I might be thinking about. And some of you might prefer to just go click on those as R is prompting. But you could go see which ones are less than 600. Before I sum those up, I’ll show you the head.

Skip to 1 minute and 18 seconds This is gonna be trues and falses here. None of those first six, the part before 6 o’clock in the morning. If I change that to 10 o’clock, the first two ought to be true. Let’s double check and make sure it is. Indeed, it is. The first couple are true. Let’s find out how many depart before 6 o’clock altogether. Since we get trues and falses, we can again use this idea that the trues get converted to 1s and the falses to 0s. So again, we sum up TRUEs and FALSEs as 1s and 0s respectively. This time R doesn’t know. There must be some values within our column. R doesn’t know what the value is.

Skip to 1 minute and 58 seconds It doesn’t know if the value is less than 600 or more than 600 and so on. There’s a way to deal with that. This happens to us a lot. Let’s go look at the help file at the sum. By default the sum says if there are some values that are not known that are NA, the sum oughta come back as an NA as well. Because R doesn’t know how many there were all together, and R wants you to be careful about that. But if you’re willing to accept that sometimes R doesn’t know what the value was. And you just want to sum up the ones remaining that R can deal with. You can change this na.rm to true.

Skip to 2 minutes and 32 seconds Which means, any time we come to a value that’s N/A, that’s not known, just remove those. Just don’t consider those as part of the sum. So let’s try again. Let’s take our command here and add on the parameter ma.rm = TRUE. And now R finds we are 692 flights in the early morning departing out of Indy. What about flights that departed before noon? 18,000 of them, almost half of the flights departed before noon. Okay what about the flights that departed before six in the evening? 35,000 of them, what about the flights that departed before midnight? 42,011 of those 42,750 departed before midnight.

Skip to 3 minutes and 21 seconds What if we changed this too before midnight or equal to midnight? Still get the same value. Okay, so there’s got to be some that are NAs. Let’s go check and see how many of these values are NAs. There’s a function called is.na, this is gonna give you trues and falses. We’ll just look at the head of this. The first six of them aren’t in ACE, but lots of them are in ACE. So we could sum them up. Indeed, 739 of the flights, we don’t know what the departure time is. So all together now, just as another means of sort of checking ourselves, you’ll see I do that quite a bit.

Skip to 3 minutes and 56 seconds There were 42,011 flights that departed by midnight and there were 739 flights for which the origin time was an NA, was unknown. Let’s see if that adds up to 42,750 and indeed it does. It’s a sort of just a way of double checking ourselves, alittle bit of reality check. I think reality checks like that are really important as we’re doing data analysis. It’s so important to just check and check and check and make sure we’re doing things properly and carefully as we work. I also haven’t been pointing out the need to save our files as we go. You see our file name is in red with a little star.

Skip to 4 minutes and 32 seconds I’m gonna hit Cmd+S to save the file there, make sure I don’t lose my work. You should only go as long without saving your work, as you are willing to lose the output of what you’ve done so far. All right, I hope you’re having fun so far. We’re certainly learning a lot about airline flights right off the bat.

Analyzing the Departure Times of Flights

Make a comment below if you are feeling “stuck.”

Share this video:

This video is from the free online course:

Introduction to R for Data Science

Purdue University