Want to keep learning?

This content is taken from the Purdue University & The Center for Science of Information's online course, Introduction to R for Data Science. Join the course to learn more.

Skip to 0 minutes and 12 seconds Okay, let’s answer another question about the airline flights. It takes into account not just the origins, not just the destinations, but both. We go look at the head of the origin column. Again, you’re gonna see the first two flights originated Dallas and the next four originate at Indianapolis. You could go look at the head of the destination column and again see the six airports where those first six flights originate. One thing we could do is we could put those together. And one way that we could put those together is with the paste command. So the paste command is also vectorized.

Skip to 0 minutes and 46 seconds It will go piece by piece and paste together the first settlement from one thing to the first settlement of another thing. So in this case, the things that we use are the origin and the destination vectors, okay? And rather than look at the output of the entire paste, we’re gonna go look at the head of the result of those. So the first flight went from Dallas to TPA, and the same with the next flight, and the third one went from IND to BWI and so on. And you can even put strings into paste commands, things like that. Like if you wanna put the word to in there, you can do that.

Skip to 1 minute and 20 seconds One thing we could do now is now that we’ve gone and pasted together all of the origins and all of the destinations, we could sort the result. First, before we sort the result, we want to put them into a table, so we find out how many of them there are of each type. We’ve done that before with the table command. So let’s summarize here. We’ll make a table of all the origin to destination pairs. If we do that, we can sort the table, see for instance what’s the head of that table. So the header are the ones where there is only one flight and the origin to the destination.

Skip to 2 minutes and 3 seconds So in this case, what we actually wanna do is take the tail of the result, and we see, for instance, from San Francisco to Los Angeles, 13,788 flights in 2008. And similarly, from Los Angeles back to San Francisco, there were 13,390 flights.

Skip to 2 minutes and 24 seconds I think OGG and HNL are both airports in Hawaii, but I’m not positive about that. Okay, so what did we do? We made a table of all of the origin-to-destination pairs, then sort the table and find which are the most popular such pairs by examining only the tail. Okay, now, one thing that I wanna point out with the tail command is you can say how many items you wanna see. The default is just six there, but for instance, suppose that you wanna see the most popular 20 such pairs. Okay, I can put a 20 on there and then I’ll get 20 such pairs, right? I’ll dive a little bit deeper into that list, okay?

Skip to 3 minutes and 4 seconds So here are the most popular 20 such pairs.

Identifying the Most Popular Flight Paths

Share this video:

This video is from the free online course:

Introduction to R for Data Science

Purdue University