Want to keep learning?

This content is taken from the Purdue University & The Center for Science of Information's online course, Introduction to R for Data Science. Join the course to learn more.

Skip to 0 minutes and 12 seconds All right, let’s dive a little bit deeper into those Indy flights. We saw that among the first six flights, the third, fourth, fifth, and sixth of them started in Indy. Okay, let’s see how many flights started in Indy altogether. The neat thing about R as compared to lots of other computing platforms is you don’t have to write a loop to look through every single element in some kind of vector or some kind of data structure that you wanna study. R is vectorized, meaning R is naturally really good at running functions that operate on every single element of some kind of vector or some other kind of data structure. R has very powerful functions.

Skip to 0 minutes and 52 seconds So for instance, let’s look at the origin cities and see which one of them are equal to Indy. I don’t wanna see all of the resulting values. I just wanna see the first six things that come from this command. So what am I gonna go do? I’m gonna take the origin column out of the data frame, take that whole column of origins. And see which ones are equal to Indy, to Indianapolis. I use a == to check and see if they’re equal or not. And I just go through, is the first flight coming from Indy? Is the second flight coming from Indy? Is the third flight coming from Indy?

Skip to 1 minute and 26 seconds And so on, and I go through all 7 million flights and check. And you notice R ran it almost it immediately, almost no delay. Indeed, the first two are not coming from Indy, right? They were coming from Dulles. And the third, fourth, fifth, and sixth ones were coming from Indy. Now if I try and sum up those values, you say to yourself well, how could you sum trues and falses? Very commonly, when we’re computing in many different environments, not just in R, falses get converted to 0s and trues get converted to 1s, if we try and add something up. Let’s take this previous command, and instead of doing the head, we’ll do a sum.

Skip to 2 minutes and 2 seconds And I’m gonna make myself a note here before I run that. I’m gonna make myself a note here that false values are converted to 0s. True values are converted to 1s. So sum just adds up the total, which yields the number of flights departing from Indy in 2008. So now I go back up to this line, Cmd+Return to run it. 42,000 of our 7 million flights departed from Indianapolis in 2008. That is neat and that didn’t take hardly any time. I didn’t have to write a loop, I just asked R, and R let me know. You would think that there would be a similar number of destinations that would be Indy as well.

Skip to 2 minutes and 46 seconds So, we could go check that very easily too. You’ll notice I’m just copying and pasting my code. When I highlight a line of code, I’m holding in Shift and hitting the down arrow to highlight. And then I’m using Cmd+C to copy and Cmd+V to paste. You can see that under the edit here, that copy is Cmd+C and paste is Cmd+V. I tend to use shortcuts when I’m typing, so I don’t have to use my mouse too much. Let’s go look at how many of the destination cities are Indy. So I just changed the origins, the destination’s there. I go ask R almost exactly the same number.

Skip to 3 minutes and 22 seconds I mean, we fully expect that the number is slightly different because a few airplanes may or may not still be sitting in Indianapolis. A few more of them at the beginning of the year versus at the end of the year. And sometimes airlines are taken in and out of service, but a really similar number of them landed in Indianapolis, as compared to the number of them that departed from Indianapolis. That’s pretty neat. So we’re already getting our minds wrapped around how R works, and our hands kind of used to diving into the data and answering some questions about the data set that we’ve got.

Identifying Properties

Note: If you use capitalization with a command, you will receive an error. Avoid starting a new line with a capital letter.

Notice the number of flights that departed from Indianapolis (IND) in 2008. Does the number surprise you? Search for the number of departing flights from another US city. First, try to guess how many there will be. Then, see what the actual number was. What was the difference in your guess and the actual total? Refer to http://www.airportcodes.org/ for a list of airport codes.

Add a comment below to discuss!

Share this video:

This video is from the free online course:

Introduction to R for Data Science

Purdue University