Want to keep learning?

This content is taken from the Purdue University & The Center for Science of Information's online course, Introduction to R for Data Science. Join the course to learn more.

Skip to 0 minutes and 12 seconds Another thing we commonly wanna do when we’re doing data analysis is bring in another source of data. So I’m gonna go back to the Data Expo 2009 website. You’ll see the same link we used before to download the data. But there’s supplemental data sources, which is often a great idea when you’re doing data analysis. To see what other kind of data you can bring into the picture to give you a little more insight into your data. So there’s an airports.csv file. I’m gonna hold in Ctrl since I’m on the Mac, and click on it. And then it’s gonna let me download the linked file as.

Skip to 0 minutes and 44 seconds Now if you are on Windows, you probably wanna right click on airports.csv and then you can decide where to save it. I’m gonna save it in my downloads folder because that’s where I’ve got everything else. So, I’m gonna save that in there. Now if I come back into R, I should be able to make an airport df and do a read.csv and reference my /users/mark/Downloads/airports.csv file. And I’ve brought it into memory. You’ll see there’s 3,376 rows and 7 columns, okay, and you can see that here too. Okay, so here we’re importing the data about the airports themselves, locations, et cetera.

Skip to 1 minute and 33 seconds Let’s go take a look at the head of that data frame. So I’ve got the airport code, the name of the airport, the city and state, the country, the latitude and longitude. And if these IADA codes don’t look familiar here, let’s go look at say the first 100 of them. Okay, maybe we wanna dive further into the list to find the ones that you’re familiar with. There’s more than 3,000 of them all together. Let’s go look in the airports data frame and see if we can find for instance where the IADA codes equal to IND. Okay I’ll put that in there for the row and then I’ll leave the column blank right.

Skip to 2 minutes and 13 seconds So I have to have two indices to go into the airports data frame. Okay, there’s the information about Indianapolis. Okay, or if I wanna see MD and O’Hare and Midway, which I always like, cuz it’s got my initials over the airport code, there you go. There’s the information for those three airports. Okay, that gives you a sense of things. So, what I’m gonna do now is I’m gonna take the airport names and the city and the state and I’m gonna paste them together into a new vector here, okay? I’ll just call it double U because it’s just a temporary vector, I don’t need it in the long run.

Skip to 2 minutes and 50 seconds I’m gonna go look at the airport name and at the city and the state, and I’m gonna make the separator here. Now, instead of just by default the space, I’m gonna put a comma in there, a comma and a space for the separator. Okay, so how does the head of w look? It looks like that, that’s kind of nice. Okay, so you made a vector to store the airport name, city and state. Okay and I’m gonna make the names of W be those airport codes like IND, MDW and L. Okay so, I’m gonna make that the airports DFIDA. So now if I go look at the head of W, each entry has its airport code, as the name.

Skip to 3 minutes and 43 seconds So I can now go look at W and ask it for IND, and ORD and MDW, and any others that I want. And I’ll get the same information, I’ll get the name and the city and the state, but I’ll also have the code there and I can go use the code as indices into that vector. Okay because the names of each entry that are stored there are now the airport codes. That’s the neat thing that I did here. And we’re going to make the name of each entry In the vector be the three letter airport code itself, okay?

Skip to 4 minutes and 22 seconds So, if I wanna know, for instance I just want the airport that’s got abbreviation CMH, I can just go put that in there and find out that, that’s Port Columbus International in Columbus, Ohio. It’s kind of a neat trick. That’s one way that indexing NR is very powerful. I encourage you to try it, see if you can understand what’s going on there and how we’ve used indices Star Advantage NR.

Incorporating Auxiliary Data about Airports

Share this video:

This video is from the free online course:

Introduction to R for Data Science

Purdue University