## Want to keep learning?

This content is taken from the Purdue University & The Center for Science of Information's online course, Introduction to R for Data Science. Join the course to learn more.
3.16

## Purdue University

Skip to 0 minutes and 13 seconds One thing we haven’t done yet is break the day into early morning, late morning, early evening, and late evening. I’ve done a little bit of things close to that by cutting the day into different parts, but I’d kinda like to return to that idea, maybe to build ourselves another piece of information that might be useful for us. So let’s suppose that we look at the departure time. We go look at the head of the departure time. We see again, it’s these four digit numbers. One way we could do what I’d suggested doing is dividing by 600 on each of the entries, and then taking a ceiling.

Skip to 0 minutes and 57 seconds This is gonna break the day into four parts. One being the early part of the morning, two being the later part of the morning, three being the early part of the afternoon, and four being the late part of the afternoon. We can break the day into four parts. Early morning, which is at one, corresponds to times up to 6 AM. Late morning, which is at two, corresponds to times up to 12 noon and similarly. Early evening, which whould be at three, corresponds to times up to 6 PM. And then late evening corresponds to times up to 12 midnight.

Skip to 1 minute and 45 seconds Okay, maybe I should make this even more precise, the times from midnight to 6 AM. Here from times from 6 AM to 12 noon, here from times from 12 noon to 6 PM, and here from times 6 PM to 12 midnight.

Skip to 2 minutes and 1 second So I could make, for instance, a vector of all of these here, okay? Then I could build another vector that shows the parts of the day. Let’s call it the part of the day that we want, and say by default at the beginning that I put NA maybe. Okay, let’s just go and fill the vector up with ones that are NA. Because there might be some parts of the day that are unknown, so by default, we’ll say that they’re all unknown. Okay, let’s see how long this vector ought to be. This vector ought to be 7 million units long. Okay, and in the interest of not typing things, I’m gonna say that’s how many NAs I wanna put in.

Skip to 2 minutes and 39 seconds I’m gonna put in 7 millions NAs to a vector called parts of the day. So right now, the head of parts of the day, well, they’re all NAs. And I put an repeat here, I’m sorry, should be a times. So I’ve built a vector call parts of the day, it’s got 7 million NAs in it. So initially, we put 7 million NA values inside. Now let’s look at the parts of the day in which the corresponding entry in this v here, in this vector v, is equal to a one. Any of the parts of the day in which the analogous element in v is a one, I want to call those early morning.

Skip to 3 minutes and 27 seconds Let’s make sure the v has the same length for partsofday before I do anything. And of course, it does.

Skip to 3 minutes and 34 seconds So similarly, I’m gonna go through the parts of the day in which the analogous position in v is a 2. I’m gonna make those elements for the parts of the day be late morning.

Skip to 3 minutes and 48 seconds And then parts of the day in which v was 3, in which the corresponding element here in v was a 3, I’m gonna call those early evening and similarly, Late evening. And now let’s take a table and see what happened for our parts of the day. Wow, in fact there’s none remaining because there actually weren’t any times that couldn’t be decided in this kind of scheme. Looks like there were roughly 2.5 million flights in the early evening, only 196,000 flights in the early morning, about 1.4 million flights in the late evening, and about 2.6 million flights in the late morning.

Skip to 4 minutes and 31 seconds So this is a way that we could go build a vector, and we can double check the length of it.

Skip to 4 minutes and 39 seconds If that’s just as long as the data frame itself, and what we could do is could go add a new column to our data frame, okay? Why don’t we call it timeofday, for instance? So we could call it anything we want. But I’m gonna store in there, in the timeofday, my temporary vector, partsofday that I just built. So now if I go look at the dimension of myDF, that should no longer have 29 columns. It should have 30 columns now, and still the same length. Okay, so we can double check that the length of the partsofday vector is the same as the number of rows in myDF, the data frame myDF.

Skip to 5 minutes and 24 seconds And then we can create a new column in the myDF data frame called timeofday. We can store this information we just found into this column. Now our data frame myDF has 30 columns instead of 29 columns.

Skip to 5 minutes and 47 seconds For instance, you can go look at the head(myDF\$timeofday) now. And see how the first flight was in the late evening, the next in the late morning, the next in the late morning, late morning, late evening, late evening. Let’s go check and make sure that this looks about right. Sure enough, late evening, late morning, late morning, late morning, late evening, late evening. I think we did it right. Just check to make sure that the first six flights were done properly. You can’t check all 7 million of them. I mean, you could, but it would take you a huge amount of time. And it’s much easier to just check a few and get a sense of things.

Skip to 6 minutes and 25 seconds You could take a look at the tail also.

Skip to 6 minutes and 32 seconds Let’s see, for the last six flights, you had one in the late morning, late morning, late morning, late morning, early evening, and late morning again. So it looks like it was done properly.