Want to keep learning?

This content is taken from the Purdue University & The Center for Science of Information's online course, Introduction to R for Data Science. Join the course to learn more.

Skip to 0 minutes and 12 seconds We’ve learned a lot about how you index different vectors. For instance we saw you could ask for the first 100 elements of a vector, the first 10n months of a vector. You can use trues and falses as indices into a vector. Let’s explore that a little bit more. Let’s make a table of how many flights originate at each airport. We’ve done something like this before.

Skip to 0 minutes and 39 seconds Inside here I’ve got data. These are the numbers and I’ve got the names for each piece of data. Along with each number, actually in the vector, I’ve got a name. In fact, if I only look at the names of the vector, I’m not even gonna see the data any more. I’m just gonna see those airport codes. So I can use those airport codes, those names as indices into my vector. If for instance, I wanna just extract the information about Indianapolis and try and get out this 42,750 without retyping it, I can just put in there IND, and I’m gonna get the information just for Indianapolis.

Skip to 1 minute and 24 seconds So we can index a vector according to the names of the elements in the vector.

Skip to 1 minute and 36 seconds Now for instance if I want both Indy and O’Hare, this will work for me I can’t ask it for two separate things. I have to send the index and the vector just one object and this is one thing and this is another thing. So I’ve got to wrap up together into one common object and the way that you do that is with the C for concatenating things together. So I can concatenate together IND and ORD.

Skip to 2 minutes and 6 seconds Say, well, how many of each of those are there? If I wanna put some other airports in there, I can. I can also throw in there JFK, EWR, put IAD in there. I can put any kind of names of airports that I want to in there. In fact, I can take a whole vector of names, like most popular, and make that be the index into the vector. So one thing we can do is we can manually type the names of elements we want to extract.

Skip to 2 minutes and 40 seconds Or we can save the indices of elements we want to extract into a vector such as the most popular vector that we made.

Skip to 2 minutes and 51 seconds And use that whole vector as a set of indices in ten other vector.

Skip to 2 minutes and 59 seconds So for instance I can take the table of all of the origins there and ask it specifically to extract in the order of most popular there in the order of whatever I put in here those corresponding elements, and there they are. All right, the indices are just the names. And I’ve used those to go into this table, and grab the corresponding data, grab these counts. Okay, those counts aren’t stored themselves in most popular, I’ve gone into the origin vector there, to the vector of the table I made of all the origins and their counts and grab the ones that were most popular.

Skip to 3 minutes and 42 seconds And I can do this on the fly too, for instance. What could I do? I could take this vector of airports that I like, my airports. I could myself another vector of those if you want to see what that looks like it’s just got the name it doesn’t have any counts at all.

Skip to 4 minutes and 2 seconds And use this as an index into the table accounts.

Skip to 4 minutes and 8 seconds In other words, put that in there where most popular was.

Skip to 4 minutes and 15 seconds There you go. So it’s gonna go look at this vector, my airports and use that as a batch of indices into this table that I’ve built. That enables you to go in and index things in a powerful way, to go in and look for pieces of data that you really want and extract the data without having to loop through or manually write some kind of for loops or anything like that at all. And certainly not having to go and look through with your eyeball and your finger and retyping things. Any time you retype stuff you’re liable to make an error. See even I just did. It happens all the time.

Skip to 4 minutes and 52 seconds The more that you can automate the process the more that you can figure out how to do things systematically, that’ll lead to more accurate data analysis, easier way of thinking about the data, easier way of explaining it to other folks. So, I urge you to try and make things as automatic and as systematic as possible. Leads to this whole idea of reproducibility in your data analysis so others can go back and see what you’ve done in your own analysis when you hand the code to someone else. It’s a good thing to be thinking about.

Using Airport Codes

Share this video:

This video is from the free online course:

Introduction to R for Data Science

Purdue University