Want to keep learning?

This content is taken from the Purdue University & The Center for Science of Information's online course, Introduction to R for Data Science. Join the course to learn more.

Skip to 0 minutes and 11 seconds One thing I realized I haven’t been doing consistently is we’ve been discussing these Indy flights. Is documenting everything we did as we went to the file. So I want to take just a minute to go back through my file, and really have some good documenting practices here. I’ve put a little bit of documentation, but just not enough. This is kind of typical of what I would do after I’ve been coding for an hour or so. And I just want to make sure I documented everything I did. I’ll often put my name at the top. But the overall objective of what I’m working on would be some data analysis about Indianapolis flights.

Skip to 0 minutes and 45 seconds We imported all of the data from the 2008 data set of the ASA Data Expo 2009. Here are the first 6 flights and the last 6 flights.

Skip to 1 minute and 1 second Here are the airports that were the origins of the first 6 flights and the airports that were the origins of the last 6 flights. Here are the destination airports of the first and last 6 flights, respectively. Here is the information about the first 6 flights that had IND as the origin. Okay, I’ve already documented this one here. I’ve gone through all of the flights that have Indianapolis as the origin, and summed up the number of such flights. I’m gonna get trues for the ones that have Indy as a origin and false for the ones that don’t have Indy as an origin.

Skip to 1 minute and 43 seconds And then when I apply the sum, all the trues get converted to ones, the falses get converted to zeros. So the sum just gives you how many ones occurred, it just adds up all the number of ones, right? And as we saw, there were 42,750 such flights. How many were having India as a destination? Similarly, we had 42,732 flights with ND as the destination. Okay, so here what did we do? We created a smaller data frame with only the flights in which ND is the origin city. Similarly, we made a data frame With the flights for which Indy is the destination city. Here are the first 6 flights of each of those new data frames.

Skip to 2 minutes and 32 seconds All right, so then we started doing some interesting things with these data frames. We went and looked at all the months for the flights that had Indy as the origin. And found how many there were with each of those months. How many flights departed from Indy. I’m trying not to say departed from, but rather had Indy as the origin city. That way, we don’t get confused during each month of the year. Now we can plot that data easily. Similarly for the flights for which Indy was the destination, we did just the same. Then we went back and looked at the first 6 flights that had Indy as the origin.

Skip to 3 minutes and 11 seconds Now here I’ve gone and documented how many departure times were less than 600. Okay, I said that I’m gonna sum up trues and falses as ones and zeros. But I didn’t actually write down # this is the number of flights that departed before 6 AM. #similarly, flights that departed before 12 PM, 6 PM and 12 midnight. # Note to ourselves, The na.rm means to remove any values that were not known, i.e., that were appearing as “NA”. So, all together we had 42,011 flights departing by midnight. That’s what we saw, and 739 flights, that had NA for the departure time. All right, thanks for walking back through those examples with me.

Skip to 4 minutes and 3 seconds I think it’s a really good idea to comment your code. Sometimes I’m doing that as we’re discussing, but I wanna emphasize how important that is. Otherwise, you pull this file open in a couple weeks, or in a month, or in 6 months. You have no idea what you were doing with this code. Just saves you a lot of time in the end, if you take a few minutes to document what you’re doing, so I strongly encourage that.

Annotating R Code with Comments

This is the last video for week 1. We have covered a lot of concepts this week! You’re getting the hang of things. This last video for the week was important because it is helpful to document what you are doing as you code in R.

Make a plot of departure times, grouped by hour, for the flights from ATL to LAX. When you are finished with your plot, you may mark this step complete!

Perhaps this is the first time you have ever used R. After completing this step, jump onto the discussion board and share your experience with the week 1 content with other learners!

You may tweet about this course using the hashtag #FLDataScience. Tell the world what you are learning to do in R!

Share this video:

This video is from the free online course:

Introduction to R for Data Science

Purdue University