Want to keep learning?

This content is taken from the Purdue University & The Center for Science of Information's online course, Introduction to R for Data Science. Join the course to learn more.

Skip to 0 minutes and 12 seconds I’m in favor of doing things in R in multiple ways, so you can verify what you’re doing. Find which ways the most effective way, just reinforce your learning things like this. I think you’ll remember earlier in our learning, we looked at the departure times and we cut the departure times according to a vector of breaks. So what were the breaks that we used? We used sequence between 0 and 2,400 and we broke that data up every 100 numbers. So as a result this is what we got. We got a table of how many flights departed within each interval here, between midnight and 1 AM, 1 AM to 2 AM, 2 AM to 3 AM and so on.

Skip to 0 minutes and 53 seconds Let’s store this as a vector, and come back and see if we can’t do this another way, just for comparison. So let’s go look at the departure times. You look at the head of those. They’re all four digit numbers, right.

Skip to 1 minute and 8 seconds This one means 8:03 in the evening, 7:54 in the morning, 6:28 in the morning, 9:26 in the morning, 6:29 in the evening and so on. I could take those departure times and divide by 100. Let’s again take the head of this before we do the whole vector. And what I’d really like to extract now is a 20-year, seven year, six year, and so on, just extract the hour part. And if by like a minus will round things up rather than down, that’s what actually happens with this cutting and breaking we did earlier. So I could take the ceiling of those numbers.

Skip to 1 minute and 48 seconds And indeed, I round up to 21, up to 8, up to 7, up to 10, 19, 20 and so on. And if instead of taking the head, I table all 7 million such numbers. You see I get exactly the same counts I had before. Like I’ve got 45,000 flights occurring in the last hour of the day. 117,000 flights occurring the hour before that and so on. So let’s call this vector W. Let’s see if these two turn out to be equal. Yeah, all of them are equal. Are there any that are not equal? You write an exclamation and an equal for not equal cuz there’s none that aren’t equal. If I sum those up, none of them weren’t equal.

Skip to 2 minutes and 34 seconds 24 of them were equal, okay. So, let’s put some documenting in here. We had already analyzed in an earlier session how many flights occur all together during each hour of the day.

Skip to 2 minutes and 51 seconds Here’s another way to do that by dividing each four digit time by 100 and then rounding the resulting fraction up to the next closest integer.

Skip to 3 minutes and 6 seconds All of the results from the two methods agree, none disagree. And one nice thing is, if I go plot this now, instead of having the funny regions for the breaks, I now just have these integers. Which are a lot better looking along the x axis, right? Again 20 corresponding to 8PM, flights between 7PM and 8PM, 14 corresponding to flights, between 1PM and 2PM, things like that. 7 corresponding to flights between 6AM and 7AM, and so on. So here’s the analogous plot. And the x axis looks better than it did the first time that we visited this question. Think in general, it’s really nice to try and compute things a couple different ways just as a reality check, a sanity check.

Skip to 3 minutes and 59 seconds Just the double checking and making sure you’re doing things the right way. There’s often lots of ways to solve the same question.

Revisiting a Plot in R

Share this video:

This video is from the free online course:

Introduction to R for Data Science

Purdue University