Want to keep learning?

This content is taken from the Purdue University & The Center for Science of Information's online course, Introduction to R for Data Science. Join the course to learn more.

Skip to 0 minutes and 12 seconds Here’s another thing we might wanna consider. Let’s start with all of the departure delays.

Skip to 0 minutes and 20 seconds Let’s break them in two different ways. You know if, for instance, we wanted to break them according to what was the month of the delay, and let’s just say let’s find out how many flights there were. Remember, when I do length here, it doesn’t actually matter what I put in the first position. Because I’m just gonna be breaking the date in the first position up according to whatever is in the second position, taking a function of the first position. So I just gonna take a length of whatever I put here, once I’ve broken up according to month.

Skip to 0 minutes and 48 seconds So this is essentially gonna tell me how many pieces of data there are in each month, how many flights there are in each month. I could also have done that according to the city of origin.

Skip to 1 minute and 5 seconds We can break the data in the departure delays according to which month or which city of origin.

Skip to 1 minute and 19 seconds Now, there’s a way that we could break according to both of these. And the way that you do that in the middle part, is you put a list. So I’m gonna put both the origin city and the month. So the middle element here, we haven’t done any lists yet, but this is our first time we see lists. I want to make a list of all the origins and all the months and I break the departures, delays, or anything I want to put here really, according to both the origin and the month and find out how many of them there are. Let’s see what we get as a result.

Skip to 1 minute and 54 seconds If my window were wider, in fact, let’s make the window a little bit wider, we would get this all as one nice looking matrix here. Where along the rows we’ve got the airport codes, and along the columns we’ve got the months. So, we now know how many flights occur from each airport in each month.

Skip to 2 minutes and 17 seconds For instance, if I wanna know how many of them occur from Indy in June, I can ask for just that entry. It should be 3,862 such flights.

Skip to 2 minutes and 32 seconds Let’s try and pick out a certain entry here. Let’s go look at Atlanta in March. There should be 36,098 flights.

Skip to 2 minutes and 46 seconds 36,098 flights.

Skip to 2 minutes and 52 seconds So we can extract the data from a particular row, in other words, the origin airport and from a particular call, in this case, the month. Here’s what we need to give two dimensions when we extract data from a matrix. We need to specify both the row and the column, right? You gotta say which row or row you wanna extract information from and which column or columns. You can do several at a time. For instance, let’s suppose that you want to know about not just Atlanta, but you want to know about Atlanta, let’s get another one near the top of the matrix here. Suppose you wanna know about ATL and AUS.

Skip to 3 minutes and 43 seconds And maybe one more, BDL.

Skip to 3 minutes and 49 seconds Okay, I’m gonna break this up, put it on two lines so you can see what I’m typing. So I’m gonna get all the data from these three airports and then which months do I want? Let’s say we get months 7, 8, 9, and 10.

Skip to 4 minutes and 7 seconds There you go. We’ve just gone and looked at these three airports in these four months.

Skip to 4 minutes and 15 seconds Here is the number of flights with three particular airports during the months 7, 8, 9, and 10, in other words July through October. You could even use a range of months.

Skip to 4 minutes and 30 seconds For instance, you could just say 7:10.

Skip to 4 minutes and 36 seconds Same effect, just write 7:10 to get the vector containing the number 7 through 10.

Analyzing Flights by Origin Airport and Month of Departure

Share this video:

This video is from the free online course:

Introduction to R for Data Science

Purdue University