Want to keep learning?

This content is taken from the Purdue University & The Center for Science of Information's online course, Introduction to R for Data Science. Join the course to learn more.

Skip to 0 minutes and 12 seconds Let’s stop for just a minute. And try and plot some of this data that we just tabulated. If I just naively go and make a plot for my new table Wow I see that I get on the y axis the number of flight which sometimes ranges into the 30, 40, 50000 range, maybe around 40000 here and then the entries on the x axis are flight paths for instance here’s a flight path from ABE to ATL, another flight path from COT to FLL< there’s too many of them to put on the x axis that was something very Best thing I could have done, to just naively plot.

Skip to 0 minutes and 47 seconds I think I mentioned to you, that plotting and visualizing should be an iterative kind of process. It’s worth tinkering with the way that you’re making your plot, until you find a good method. Method. Dot charge is often helpful when the things on your x-axis are words or phrases, or something that are a little longer left to right, and you might wanna put those on the y-axis instead and put your numbers on the x-axis. Let’s see what happens now, still not gonna be very desirable, I think yeah, you can sort of see that there are flight paths there. But there are six-letter phrases right there. The three letters for the origin, three letters for the destination.

Skip to 1 minute and 25 seconds And they’re just burnt right on top of each other. Pretty ugly. Again, you can still see the counts stretched out here. From 0 up to just over 40,000 It’s improving but is this really what we wanna see? I think it’s nice to kind of hone in on what we’d like to see. Let’s suppose you took that dot chart for instance and tried to put things in order. Okay let’s suppose you sorted before you plotted. Make a note here neither of these plots are very helpful.

Skip to 1 minute and 54 seconds We could try to sought the data first.

Skip to 1 minute and 59 seconds Still not super helpful, I now see that the date is sort of numerical order, there’s not very many flight in this 2030. 40,000 flight range here. There’s not many flight paths that have say more than 20,000 flights on that flight path. We could probably pick those out and look at them, but there’s still just too many flight paths listed here, okay? It’s nice to kind of focus in on things. So why don’t we go look at the flights originating out of Indianapolis, since that’s the closest airport to where I am. You might want to go try it for an airport near you.

Skip to 2 minutes and 31 seconds Okay so I’m gonna go back to that two dimensional My Table that I made earlier and look at all the rows that have ND as the road label and all columns are allowed. So I’m not gonna put anything after the comma. For the ND, I just want one row and then I want all the possible columns. Okay, maybe it’s helpful before I plot to go look at the head of what this is. Okay, I can even go look at all of these because there is not that many possible flights out of Indi. Okay, this is showing you from Indi to all of these possible places, how many flights are there?

Skip to 3 minutes and 2 seconds And for instance thought it would be a lot to ORD because we’re close to Chicago. And indeed there is 11,000 flights to O’Hare. Okay, so if I go plot this now, I’m getting closer to something desirable.

Skip to 3 minutes and 16 seconds Perfect. Okay, this is a lot more desirable. I still don’t know which of these flights these are. Again, this is better to put into a dot chart. Now I’ve got some that can almost make sense of. These are the possible destinations where you can fly from India, base of the number of flights that have gone that way. Okay so, let’s still try and do something better. Let’s take this my table with the Indi row and save it somewhere. Let’s save it, in a vector called V for instance, so that we can work with it little more easily okay, so we’re focusing on the flights within D as the origin. Okay, save that flight data into a vector.

Skip to 4 minutes and 1 second So now, what could I do? I could go take the things in there that are not a zero, right. Because most of the places zeros you’ d want to go from India you can’t get there. You can only get to a few different airports. So, let’s look at the elements of V such that the data is not a zero. These are all the places I can fly from India. Like I said, there’s a lot of flights to Chicago. Lots of flights to Atlanta, quite a few flights to other places. But, again, not the overwhelming number of 300-some airports. There’s only a few places you can fly from Indie. Now, let’s sort that data with O’Hare and Atlanta at the top.

Skip to 4 minutes and 37 seconds And now, we’re prepared to make a dot chart, and this is gonna be a little more helpful. Aha, this is something good to look at. Okay, now we only plot the flights from IND to airports that have at least one flight. Okay, in other words, we removed the zero data. There’s still sort of too many destinations because they’re on top of each other. I can make my screen a little larger, that would be fine. But maybe there’s some kind of natural cut off that you care about. For instance, maybe I only care about cities to which Indy has at least 4,000 flights. And there, we can go see such flights.

Skip to 5 minutes and 16 seconds Depends how much you travel, you may or may not recognize these airport codes. We’re gonna go get some more data in the next video. To learn how we might extract more information about these places where you can fly very often from Indianapolis. And see if we can use data from multiple CSV files, from multiple sources of data taken together to give us a little more information, a little bit more insight into our data analysis.

Visualizing Flight Paths

Share this video:

This video is from the free online course:

Introduction to R for Data Science

Purdue University