Skip to 0 minutes and 11 seconds I’m going to open a dataset. The dataset I’m going to open is called the “weather data”; it’s a little toy dataset that we’ll be seeing a lot of in this course. It’s got 14 instances, 14 days, and for each of these days, we have recorded the values of five attributes.
Skip to 0 minutes and 29 seconds Four are to do with the weather: Outlook, Temperature, Humidity, and Windy. The fifth, Play, is whether or not we’re going to play a particular, unspecified game. Actually, what we’re going to be doing is predicting the Play attribute from the other attributes. But let’s not worry about that at the moment. Let’s open the dataset and take a look at it in Weka. Here’s “My Documents”. Here are the Weka datasets; this is what I copied. I’m going to open weather.nominal.arff. All Weka data files are called ARFF files; we’ll talk about that later on. This is the “weather” data. Just ignore these colorful bars at the moment.
Skip to 1 minute and 17 seconds There are 14 instances; these correspond to the 14 days that we saw in the dataset on the slide.
Skip to 1 minute and 24 seconds For each day we have five attributes: outlook, temperature, humidity, windy, and play. If you select one of these attributes – outlook is selected at the moment – we can see the values. The values for the outlook attribute are “sunny”, “overcast”, and “rainy”.
Skip to 1 minute and 42 seconds These are the number of times they appear in the dataset: 5 sunny days, 4 overcast days, and 3 rainy days, for a total of 14 days, 14 instances. If we look at the “temperature” attribute, “hot”, “mild”, and “cool” are the possible values, and these are the number of times they appear in the dataset. Let’s go to the “play” attribute. There are two values for play, yes and no. Now let’s look at these two bars here. Blue corresponds to yes, and red corresponds to no. If you look at one of the other attributes, like “outlook”, you can see that when the outlook is sunny – this is like a histogram – there are three “no” instances and two “yes” instances.
Skip to 2 minutes and 34 seconds When the outlook is overcast, there are four “yes” instances and zero “no” instances. These are like a histogram of the attribute values in terms of the attribute we’re trying to predict. It makes it useful to click around and visualize your data. We’ve opened the weather data, weather.nominal.arff.
Skip to 2 minutes and 59 seconds We’ve looked at the attribute values and the attributes in Weka. There’s one more thing I want to do before we summarize here. If I go to the Edit panel, I see the data in the form that it was on the slide, with the 14 days down here and the 5 attributes across here. This is another view of the data. I can actually change this dataset. If I click here, I can change this “no” to “yes”. Or, if I click here, I can change on this day the outlook from “rainy” to “sunny”.
Skip to 3 minutes and 41 seconds (If only it were so easy in real life to change a day from rainy to sunny!) Then I can click OK, and we’ve got this edited dataset, which we could save if we’d like. We haven’t saved any of this; the dataset on the disk is still the same as it was. I’m not going to save it, and I don’t think you should save it, because we’re going to be using this dataset quite a bit in this course. Bye for now!
The weather data
Here’s how to load a dataset into the Weka Explorer interface, and look around it to see what’s there. It’s a tiny “toy” dataset, but all these operations work equally well on large, real life, ones. You can also edit the dataset, and – if you like – change it.
© University of Waikato, New Zealand. CC Creative Commons Attribution 4.0 International License.