Skip to 0 minutes and 12 secondsOne function we really benefit from learning about is the tapply function. Now, R has a whole suite of apply functions there’s a function just called, apply. There’s an s apply, l apply, m apply, there’s many apply functions. T apply, is one of the first ones I usually discuss with my students. It’s like a table apply, it’s going to yield a table of results and it’s one of the easiest ones to understand, I think. So how does the t apply function work in general? Okay, how the t apply function works, we need to give it three pieces of information. First we give it vector data we want to apply a function to.

Skip to 0 minutes and 55 secondsSecondly, we give it the way to break up the data into pieces, okay into categories. And third, We give it the function we wanna apply to the data, okay? So here's an example, let's suppose for instance, we want to find the average departure delay at each airport. Let's go look and see, which column has the departure delays. So if I go look at the departure delays here they're called D-E-P delay. That's the departure delay in minutes. So that would be something that we could study so I could go look at the departure delays and break them up according to the origin airport and the function that I might wanna take would be the average, the mean okay?

Skip to 1 minute and 45 secondsSo what's gonna happen here with the t apply is we're gonna take the mean of the departure delays. I sort of always tell students, okay, here's the function, here's the stuff you can apply it to and you leap over this thing in the middle. You take a mean of this stuff here, but you don't take a mean of all of the departure delays you take a mean of the departure delays broken up according to the origin airport, okay? You first split the data according to whatever you've put in the second part here.

Skip to 2 minutes and 13 secondsIn this case according to the origin airport, and then you take this function apply to each piece of the data, broken up according to the second way of categorizing it. If I do this here, I'm gonna get lot of na because sometimes we don't know that the part should delay. So I mention a four thing here is we can put is the fourth element extra information right, extra parameters. For instance, very commonly we use the na.rn is true like we are moving the values that are na. So let's do that. Now, If I just go and run the tapply, I'm gonna get all the result for every single airport there.

Skip to 2 minutes and 53 secondsFor instance, again thinking about Indianapolis flights, average departure delay from Indianapolis is about seven and a half minutes per flight. R found that by taking the mean of all of the departure delays such that The origin was indie. And r also went and took a mean of all the departure delays, such that origin was O'Hare, or was LAX. And you can do things with the result of the tapply for instance, I can sort there and now they're all in order. I see ACK has the longest delay on average, the longest average delay, and WYS has the shortest delay in fact, leaves early on average. And you can do things like take ahead, take a tail.

Skip to 3 minutes and 41 secondsSo, I'll say here, under airports, with the largest average departure delays. And here's the airports with the smallest average departure delays. Similarly you can also look at the arrival delays okay. It might make more sense to look at the destination airports when you're breaking things up if you want to look at their arrive delays. So they're the longest arrival delays on average and we could do the same with the smallest average arrival delays. Lots of stuff we could do with R I've called this file tapplyfunction.R because there's so many things we can do with the tapply, I wanted to give you several different kind of examples. But there's a first set of examples for you of the tapply function.

Introduction to the Tapply Function

Share this video:

This video is from the free online course:

Introduction to R for Data Science

Purdue University