Want to keep learning?

This content is taken from the Purdue University & The Center for Science of Information's online course, Introduction to R for Data Science. Join the course to learn more.

Skip to 0 minutes and 13 seconds >> Okay, let’s go dive a little more deeply into all of the places where we can fly from Indiana, okay. Not just from the Indianapolis airport. So if I go look at the head, the airports data frame. Remember we’ve got the airline codes and we’ve got the airport names, and the city and the state and so on, okay? So for instance, if I go look at just the state column, and I go make a table, I can find out, how many airports every state has and I see for instance, okay Ohio got 100, California has 245 airports, Indiana has 65 airports all together.

Skip to 0 minutes and 51 seconds The Indianapolis airport is just one of those, there may be some others that I want to find out about. Okay, so let’s go take a subset of the airport’s data frame, for which the state is Indiana. Okay, now it gives me an extra column there, the longitude, because it didn’t fit it over on the side of the window, but it’s really just a data frame. Okay it’s a data frame with the airport codes, the airport names, city, state, country, latitude and then as I said longitude but it just doesn’t fit over there. And for instance one of those is the Indianapolis airport which we’ve looked at quite a bit. But there’s others. There’s an airport right here at Purdue University.

Skip to 1 minute and 29 seconds There’s an airport out in White County. There’s airports all over the state. They don’t all have commercial traffic in the last few years, okay? Lots of them are just for private planes and may not show up in our database, okay? For instance, the Purdue airport doesn’t have commercial flights on major carriers right now. So let’s go dive a little bit more into all of the airports across the state of Indiana and see what we can learn. Okay so I’m gonna take this subset of data frame here and I’m gonna save it as Indy airports just so I can refer to it more easily.

Skip to 2 minutes and 3 seconds Okay and now if I go make a table of all of the origins in the entire data, what do I find? These are all of the airports that show up in our actual data set. Okay, there’s only about 300 of them across the whole country. And for instance I could go look at the one for Indianapolis, I could go look at the one for O’Hare, things like that.

Skip to 2 minutes and 32 seconds So we can make a table that shows all of the flight counts as origins, for all airports in the full data set, from 2006 to 2008. And I’ll say, not just Indiana airports. Okay, and that’s what I did at the beginning here. Okay, that’s the entire listing of all of them, that’s all the airports in the whole country and how many flights there were from each of those as an origin airport. Some of them, I mean you see have hundreds of thousands of flights, okay? What I would like to know is which of these airports are in Indiana now.

Skip to 3 minutes and 7 seconds I could go look at the list of Indiana airports and go use my eyeball back and forth to find which ones are in here. But that’s what I want to avoid with data analysis. With data analysis I want to avoid manual work that might be prone to errors, prone to missing things, prone to inaccuracy. And not reproducible okay. I want to use things that are systematic, reproducible and bound to be correct cuz the computer is doing them instead of my eyeball. Let’s go in and look at the Indy airports IATA codes. These are the ones I would be looking for, these 60 some codes. I want to go look in the main data set and see if I find them.

Skip to 3 minutes and 44 seconds Like for instance we know that we find IND, there’s gotta be other airports in Indiana that show up in our dataset. Okay, but if I go look at the class of those codes, its a factor. It’s showing you that those are the possible levels, the possible values of airport codes. So, I’m going to convert that instead to make that into a character vector. It just looks very similar, but it puts double quotes around all of them just the way I did here. Okay? So now I can use these character codes as an index into the table. Just like I manually typed in IND or OHARE before.

Skip to 4 minutes and 18 seconds I can go check and see if these character codes show up in my table of my data anywhere. And many of them won’t cuz many of them don’t have commercial flights. [LAUGH] Look at that, Indianapolis has 123,000 flights out of IND between 2006, 2008. There’s a couple other Indiana airports that do, but most Indiana airports don’t have commercial flights. They’re so small, they’re just private planes. Let’s see if we could find the information about these airports. We know this went to Indianapolis but I don’t know off the top of my head what these other airports are. So I’m gonna go take that result and temporarily store it in the vector I’m gonna call V. Again, just temporary vector there.

Skip to 4 minutes and 58 seconds And I can go look at V, and say which ones are not NAs. The exclamation mark means not. So I first look at V and find out which one of those entries are NAs. And then those are the ones I don’t want. So I put an exclamation to get the ones that I do want, the ones that aren’t NAs. And there’s the same for what we saw up above. It just picks them out from up there I can get the names of those. I don’t have to go write them down anything like that. You notice I’m trying really hard not to go write something externally or in another file. I’m doing anything inside R.

Skip to 5 minutes and 31 seconds Those are the airport codes themselves and now I can go find out where those are located. Okay, I can go look into the airport’s data frame, and see which of those IATA codes are in this new little vector I just built. And there they are, Indianapolis International, there’s one in Fort Wayne, one in Evansville down in the southern part of the state, and one in South Bend that’s where Notre Dame is located. They have an airport with commercial flights. So that’s one way to find out which of these 60 some airports in Indiana are actually having commercial flights. And then go and look in our airports data frame to find out the information about where they’re located and everything.

Skip to 6 minutes and 12 seconds Without having to do anything manually at all.

Identifying Airports with Commercial Flights

Share this video:

This video is from the free online course:

Introduction to R for Data Science

Purdue University