Skip to 0 minutes and 13 seconds Next thing we’d like to do is go open that data that we’ve downloaded in RStudio. So I’m going to go back into RStudio. RStudio loads the last R file that was open. So my Day1.R file is still open here. I want to use the read.csv command to read in that data that we’ve just downloaded. I’m not just gonna read it in. I’m gonna read it in and store it in a variable. You can call your variable anything. I’m gonna call my variable myDF. DF is just shorthand for dataframe. I tell my students sort of jokingly, you can call your variable anything.
Skip to 0 minutes and 54 seconds I sometimes call my first variable pizza just for effect so the students can see that they really have a lot of freedom in what they call their variables. But let’s suppose we call our variable myDF. This arrow symbol means, take whatever’s on the right-hand side, and output the results of it, storing it into the variable that’s on the left-hand side. Some people prefer to write an equals mark. But I kind of like using the arrow notation because it shows whatever I do on the right-hand side is gonna get stored in the variable on the left-hand side.
Skip to 1 minute and 25 seconds So then inside this read.csv, I’ve got parentheses to keep track of the parameters and then I have double quotes, I”m gonna type the location of the file and then end with another double quotes. So right there, I’ve got to tell RStudio where my files are located.
Skip to 1 minute and 42 seconds So, I am gonna put a capital C://, and I try to get in the habit of not typing more than I need to. So, one idea that’s used in a lot of operating systems, a lot of environments in particular, you can use in RStudio and windows is to hit the tab key, when you think there’s something that R can autocomplete here. So if I hit the tab, I get a little menu of possible places that I might want to go, possible things I might want to fill in and in this case, I wanna fill in users.
Skip to 2 minutes and 13 seconds And instead of putting my name, I’m gonna actually hit tab again, it’s gonna bring up the users on my machine, and I’ll click on Mark Daniel Ward. Yours, of course, won’t say Mark Daniel Ward, but your file might be in a similar location. Then I hit tab again and Downloads, and then I hit tab again there and nothing new is coming up. So, I know that I’ve gone as far as I need to go. There’s maybe too many things in that folder to show. But if I type a 2, indeed, there’s some files in that folder that start with a 2. And I want the 2008.csv. So I’m gonna run that line.
Skip to 2 minutes and 52 seconds Again, I’m not gonna move my cursor up here and hit the Run button. What I’m gonna do is use Ctrl+R on Windows to run that line. That’s can take a couple of minutes to run. So while it’s running, let’s go find our Downloads folder in the Windows operating system, and see that our 2008.csv file is located where we claimed it was. You notice that R is running right now. We’re not at the prompt any more. Our cursor is down on the next line. We’ve got the little stop octagon here, which means the R is doing its thing, R is running, R is importing that data into its memory space.
Skip to 3 minutes and 29 seconds So, like I said, let’s go find that this 2008.csv file is really where we thought it was while R is downloading the data into its memory. So I’m gonna click on my folder here, and into my Downloads. There’s my 2008 file. If I right-click on that file and go to Properties, this is where I’m gonna find that pertinent information that I wanted.
Skip to 3 minutes and 54 seconds The location of that file is C:\Users\Mark Daniel Ward\Downloads. Now, Windows users are often used to putting back slashes in between the folders when they’re navigating through the file structure. But it’s okay in R to put forward slashes.
Skip to 4 minutes and 12 seconds So you notice that I’ve put C://Users /Mark Daniel Ward/Downloads/2008.csv. Yours should be something comparable to that. So now there’s nothing to do but just sit and wait for R to import the data. Remember that, there were about 650 megabytes of data in this 2008.csv file. All of that is going to become even bigger as R imports the data because underneath the hood internally, R is going to store this data in a savvy way. R is gonna store this data in such a way that it can quickly make queries about the data and answer questions and respond to commands that you might type inside R. As you wanna manipulate the data. Now I see the data’s been imported.
Skip to 5 minutes and 4 seconds How do I know? The little stop sign went away and I got my prompt back. I got another prompt here. So I can go look at this data, before I type anything else, I’m gonna go ahead and hit the save button. Or as I mentioned you can hit Ctrl+S as well, that’s another way you can save without having to move the mouse to the icon there. For instance, if I wanna see the first few lines of this dataframe, I can type head myDF and hit Ctrl+R to run. Notice that I’m getting to see the first six lines of my file. It might look like I’m seeing more than that, but it’s really just six lines.
Skip to 5 minutes and 41 seconds And my screen isn’t wide enough to view all the way cross each line, all at once. So you notice the first six lines all start with the year entry of 2008, and a month entry of 1, and a day of month entry 3. The first six lines are all about flights that happened on January 3rd, that was the 4th day of the week, we’ll say more about that later. And you’ve got departure time, arrival time, the carrier for the flight. Many things like that, and just think to yourself, these six rows could potentially stretch all the way across your screen. There’s really just six rows of data, and then there’s the header there as well, okay.
Skip to 6 minutes and 20 seconds And you’ll see all 29 columns of your data. Scroll down and you’ll see another prompt where you can interact more with R. If you want to know a little about this command that I used, this read.csv, what you can do is you can type ?read.csv and hit Ctrl+R. And you’ll notice that this window changes and Help comes up. And inside there, you’ll see a help file for read.table, and you say, well read.table isn’t what I asked for. But often there’s quite a few related commands in the same Help file. And in particular here, there’s read.table, read.csv, read.delim and some others similar commands all in the same help file.
Skip to 7 minutes and 1 second read.csv expects you to input the name of a file and expects that the file will have a header. That the very first line of the file will tell you the names that you want the columns to be named. If I was in contrast, going to import a file that didn’t have names at the tops of the columns, I might use read.table which expects you to give a file name but expects there not to be the names of the columns on the first line of the file. Okay, well now you’ve imported some data into R, and you’ve seen that you got it imported correctly.
Skip to 7 minutes and 36 seconds If you want to, you can leave R open, so that you’re ready for the next thing that we do. But I’ve realized, I haven’t shown you how to quit yet. I always type Ctrl+S before I exit the RStudio system so that my file is saved. And then when I’m ready to quit, I usually don’t put the quit command inside the file itself. I’ll usually just come down here and put a, q, which stands for quit, and no parameters are needed, so just the left and the right parenthesis. Then I hit return. And it ask if I wanna save the work space image. I usually don’t do that because then, R has to reopen the workspace next time that R loads.
Skip to 8 minutes and 14 seconds Now, even though I’m not gonna save the workspace image, that just means next time I come into R, if I wanna work with the 2008 data again, I’m gonna have to import it into R again, and that’s fine. So you might choose to save the workspace image, that’s up to you. But I personally choose not to do that. So I hit no, don’t save the workspace image. And now I’ve exited R, and we’re done interacting with your session for now. We’re gonna open R back up, and learn some more things about R. I think that’s a full introduction on how you get R installed. Some data downloaded, some data imported into R. Check and make sure the data’s actually there.
Skip to 8 minutes and 50 seconds Save your file of your commands, and then exit the R system. We’ve learned quite a bit. Considering we did that with more than 600 megabytes of data.
Import Data into R on Windows
Re-emphasize - Recommended System Requirements:
(a). 16 GB RAM or above. A few learners from the past semesters reported that they encountered errors when handling data of multiple gigabytes. It turned out that more than 16GB RAM would be ideal to avoid such errors.
(b). 64-bit operating system, R and RStudio. Most operating systems in PC/Mac today should be 64-bit, just make sure you installed 64-bit R and RStudio.