Want to keep learning?

This content is taken from the Purdue University & The Center for Science of Information's online course, Introduction to R for Data Science. Join the course to learn more.

Skip to 0 minutes and 11 seconds All right, let’s explore a little bit further which flights fly in and out of Indianapolis. It’s possible to make smaller subsets so that we don’t have to deal with all 7 million rows in this really large data frame. I’m gonna make a smaller data frame which I’ll just initially call for instance MyIndyOrigins. You can call it anything you like. It doesn’t matter. But I’m gonna take a subset of my original data frame and just extract the rows that meet a certain condition. And the condition I’ll use is this one earlier, that the origin is Indy. So I’m gonna copy that and paste it in here.

Skip to 0 minutes and 48 seconds So if I run that command, R runs through all 7 million rows in the original data frame and finds just the 42,750 flights that had the original as Indianapolis. And it builds me a new data frame called MyIndyOrigins with those 42,750 flights. And similarly, I could make another data frame with all the destinations that were Indianapolis. It can be called anything you like. Just to emphasize it doesn’t have to be called exactly the same thing, I’ll call it IndianapolisDestinations. And the rest of the command is gonna look pretty similar except I’m gonna look for the flights that have Indy as the destination. And we expect it to be 42,732 of these because we already saw there was 42,732 such flights.

Skip to 1 minute and 38 seconds And indeed here, in the variables now, we’ve got a new variable called Indianapolis destinations with just as we thought 42,732 such flights. Let’s go dive a little bit deeper here. If I go inside MyIndyOrigins and I look at the head, inside the Origin column, all of the origins better be Indy. Let’s check. Indeed, all the origin’s are Indy. And you see flights to BWI and JAX into LAS and so on. And similarly if you look at the head of Indianapolis destinations all the destinations in that data frame are Indianapolis as well. Okay, there’s more things we might wanna know about this Indianapolis flights.

Skip to 2 minutes and 23 seconds For instance, what if we wanted to know how many Indianapolis flights departed during each month or we could look at MyIndyOrigins. That’s a data frame with 29 columns and one of the columns is MONTH. These first six all have MONTH 1. But the months in general are all going be between 1 and 12, corresponding to the months of January through December. So I could look at the month there. The first six are all going to be 1s, the last six are all going to be 12s. There’s a command called table that’s gonna let us find out how many of each entry there are and give me back a table with all that data.

Skip to 2 minutes and 59 seconds The flights that originated in Indianapolis that had Indy as the origin. There were 3,580 flights in January, 3,414 flights in February, 3,764 in March, and so on and so forth. It is kind of hard just looking with your eyeballs at those 12 numbers. Which is the largest? Which is the smallest? Is there a trend? That kind of thing. I think it’s kind of neat from right off the bat to go ahead and start plotting some of your data. Because it just gives you another angle by which to consider the stuff that you’re looking at. So you could take that previous command and wrap it in a plot and there you go.

Skip to 3 minutes and 35 seconds R will show you in the early months of the year there’s a little bit of a drop, except for maybe March. And then lots of flights in the summertime, decreasing somewhat in the fall, again except for a little increase in October. So there’s just a little bit of a cyclic pattern here amongst the flights month by month that are originating in Indianapolis. Again a neat thing is that we can just copy and paste and go look at destinations as well. If I change this from the data frame MyIndyOrigins to the data frame Indianapolis destinations. The same code is going to work.

Skip to 4 minutes and 8 seconds I can get a table that corresponds to the destinations, and similarly I can get a plot that corresponds to the destinations. It’s gonna be almost exactly the same. You probably didn’t even see things change when I ran it because they’re only 3,580 times Indy was the origin in January and 3,582 times Indy was the destination in January. So the plots look almost exactly the same and that’s kinda what we expect. That’s kind of a way of checking to make sure that we’re doing something sensible with our study of these Indianapolis flights.

Extracting Flight Data with a Common City of Origin

Note: Click on the icon on the lower-right corner of the video to enlarge the video. You will need to do this for many of the videos in order to view what the educator is typing.

Quiz 3 feedback:

Be careful when copy and paste the quotation marks from this site. Sometimes the quotation marks on futurelearn.com are opening and closing quotes. (It occurred when you submit the correct answer to Question 1 of Quiz 3. The educator’s remark includes “ORD”).

However, in R, only straight quotes are allowed.

Check this page to see the difference between opening/closing quotes and straight quotes.

– Teaching assistant

Share this video:

This video is from the free online course:

Introduction to R for Data Science

Purdue University