Now let’s go import the data that we’ve just downloaded from that ASA data export 2009 data set into our studio environment I first have to open RStudio. I’ve gone ahead and closed it so that we’ll have to walk through the step of opening it again. So I’m gonna click on my Finder there in the lower left-hand corner of my Mac. Click on Applications, scroll down to the bottom or near the bottom until I find our Studio and double-click on RStudio. I’ll come back to the same environment that I had before. Inside RStudio I’ve got a place like quick type code directly.
I prefer to go ahead and make a new file so that once I’ve typed some code I can have it saved as well. So, inside RStudio I usually make a New file and then in our script and I go ahead and save that. I’ll click on the little button and I’ll save it as Day1 on the MAC.R. I’ll save it in my home directory. So, I can type things in there, I can move around inside there. Every time I type a new line, the file name changes to red, and I see a little star to remind me that it hasn’t been saved. So I can just click on that disk, if I wanna save my file.
And you notice the console is pushed down here and anything I type in my file and I run will appear in the console. We’ll do that in just a moment. Up here on the upper right, I’m gonna see the names of any variables that I’ve saved, and that window will continue to evolve as I work inside R, and in the lower right, this window is kind of versatile. Right now it’s showing the different files and folders that I’ve got in my home directory. It’ll instead show plots if I go plot some data. It’ll show packages that I have installed in R. It’ll also has a help viewer built in. Many kinds of things for this lower window.
What I’d like to do is I’d like to import the data into R that I’ve downloaded. So I use the read.csv command to do that. Now if I wanna get the Help menu to open on the read.csv command I’d put a question mark out front. When I wanna go run a line of R code, I hit the Run button or I just put my cursor on the line I wanna run. And on the Macintosh, I can hit Command + Return and I’ll get exactly the same effect. A help window will open and I’ll see the help, not only for read.csv but other related commands like read.table read.lemon and so on. My file has a header.
It’s got a line and shows me what’s at the top of every column. So I’m gonna use read.csv, which expects me to tell it where the file is located and expects the file to have a header on it. If my file didn’t have a header on it, I might read.table, which again, expects to know where the file is located, but does not expect a header. So I wanna say one more thing about running commands before I go and run the command to load the data. If I do something like asking R for ten uniform random variables there, I can just click the Run button and then the cursor moves on down the next line.
I can click on the line again and again hit the Run button and the cursor move down to the next line. But it may make more sense to just put the cursor on the line you’re on and again hit Command + Return. Command + Return will run the line your cursor’s on. And you don’t have to be at the end of the line. I see people sometimes put their cursor somewhere else and then scroll to the end of the line, but you can even be in the middle of the line. If you hit Command + Return it’s gonna run all the code on that line.
So there you see every time I ran this command asking for ten uniformly distributed values R gave me 10 new values. If I want more values I can ask it for 20. You can ask it for as many of those as you like and when I hit Return some more times it doesn’t do anything, in fact even if I come up here and I hit Return just by itself it’s not gonna do anything except move my commands around. If I hit Return and I hit Delete, what’s in the console doesn’t change, because I haven’t hit Cmd + Return. Cmd + Return is what’s needed if you wanna actually execute the line. Okay, let’s go import the data into R.
So to do that, I’m gonna save the result of my import into a variable called myDF. There’s nothing sacred about myDF. I could have called it pizza if I wanted to. Any variable name will do. But I am gonna get a data frame so I tend to call my data frames myDF. So I’m gonna take the results of a read.csv command and import the results into the myDF variable that I declared. So my file is stored inside Users/something. I could type the whole location to my file but I think it’s better to hit the Tab button R will try to complete what you want.
So instead I could just click on the users and instead of typing any more hit Tab again. And I wanna go into the Mark folder cuz Mark is the name of my home directory, and I’ll hit Tab again. And that will take me into my downloads, and I could hit Tap again and I’ll find this 2008.csv file that’s exactly the one I want. So I’m gonna go and run that line by hitting Command + Return and R is gonna run from in it. Just to verify for you that this is where the file is located, while R is running, let’s go ahead and go look in our Mac.
Inside my finder, if I click on my downloads, I’ll see 2008.csv, and, if I wanna see the full path of that file, I can hold on the command and click on the name of the folder at the top. Again, just clicking on that name won’t work on its own, I’ve gotta hold in command and click on the name on the top of the folder. And I’ll see that this is indeed inside the Users folder, which is containing a folder called Mark, which contains another folder called Downloads, and then the file I want is inside there. This is the actual data, this is the data in a compressed format.
It’s gonna take it a minute to import it into R because it’s 600 something megabytes of data moreover once it gets imported into R, underneath the hood R stores the data in an intelligent way so that R is prepared to make queries about the data, to manipulate the data in whatever way we ask R to do.
Okay, you might not have noticed but the stop sign went away here, R is done executing and I got a prompt back which means R it has already run my command. So if I wanna look at what actually happened, I don’t wanna see all of that data, because as I see over here on the right-hand side, there’s more than 7 million rows in that file with 29 columns. So I’m just gonna look at the first six rows of my data. I’m gonna type head and then in parentheses put my dfs for the head of the data frame. And then I’ll press Command + Return to run.
Looks like I got more than six lines but if your screen were really wider you’d see that you just have the first six rows of the data set there. Every rise 2008 for the first century for the year, it’s got a 1 signifying January for the month and a 3 signifying the third day of January for the day of the month, that’s the fourth day of the week in the way that the days of the week are enumerated, this we’ll see more. It’s got information about the departure time, the arrival time, the carrier, the flight number, the tail number, many different kinds of things, 29 variables altogether.
And again, when you’re looking at this in R, just imagine your screen were much wider so that you always see a header and then six rows of data. And then more header, and then the same six rows of data, but just, again, your screen is compressing it so you can see it all at once. All right, when you’re all finished, and ready to exit, don’t forget to save your file. Either, you can click the little disc button there, or you can hit Command + S, to save your file. You’ll notice the name of the file turns to black once I hit command test to show me it’s saved.
In the console, you can type q and then a left a right parenthesis. And R will ask you if you wanna save your image. I don’t save my image. Since I don’t, I’m gonna have to import the data again next time I open it. You do save your image the benefit is you won’t have to import your data again, but it will take R longer to open next time. So I’m going to choose to not save the image, although some of my students do like saving the image before R closes and you’re welcome to do that if you prefer.
So I’ve now downloaded and installed R, downloaded and installed RStudio, downloaded some data from the asa, dataexpo2009, imported it into R and verified that it got imported and been able to save some R code as well. We’ve really learned a lot already about how R works, and we’re gonna learn a lot more during our time together. I hope you’re enjoying so far.