Skip main navigation

The weather data

Ian WItten shows how to load a dataset into the Weka Explorer interface, and look around to see what it contains.
I’m going to open a dataset. The dataset I’m going to open is called the “weather data”; it’s a little toy dataset that we’ll be seeing a lot of in this course. It’s got 14 instances, 14 days, and for each of these days, we have recorded the values of five attributes.
Four are to do with the weather: Outlook, Temperature, Humidity, and Windy. The fifth, Play, is whether or not we’re going to play a particular, unspecified game. Actually, what we’re going to be doing is predicting the Play attribute from the other attributes. But let’s not worry about that at the moment. Let’s open the dataset and take a look at it in Weka. Here’s “My Documents”. Here are the Weka datasets; this is what I copied. I’m going to open weather.nominal.arff. All Weka data files are called ARFF files; we’ll talk about that later on. This is the “weather” data. Just ignore these colorful bars at the moment.
There are 14 instances; these correspond to the 14 days that we saw in the dataset on the slide.
For each day we have five attributes: outlook, temperature, humidity, windy, and play. If you select one of these attributes – outlook is selected at the moment – we can see the values. The values for the outlook attribute are “sunny”, “overcast”, and “rainy”.
These are the number of times they appear in the dataset: 5 sunny days, 4 overcast days, and 3 rainy days, for a total of 14 days, 14 instances. If we look at the “temperature” attribute, “hot”, “mild”, and “cool” are the possible values, and these are the number of times they appear in the dataset. Let’s go to the “play” attribute. There are two values for play, yes and no. Now let’s look at these two bars here. Blue corresponds to yes, and red corresponds to no. If you look at one of the other attributes, like “outlook”, you can see that when the outlook is sunny – this is like a histogram – there are three “no” instances and two “yes” instances.
When the outlook is overcast, there are four “yes” instances and zero “no” instances. These are like a histogram of the attribute values in terms of the attribute we’re trying to predict. It makes it useful to click around and visualize your data. We’ve opened the weather data, weather.nominal.arff.
We’ve looked at the attribute values and the attributes in Weka. There’s one more thing I want to do before we summarize here. If I go to the Edit panel, I see the data in the form that it was on the slide, with the 14 days down here and the 5 attributes across here. This is another view of the data. I can actually change this dataset. If I click here, I can change this “no” to “yes”. Or, if I click here, I can change on this day the outlook from “rainy” to “sunny”.
(If only it were so easy in real life to change a day from rainy to sunny!) Then I can click OK, and we’ve got this edited dataset, which we could save if we’d like. We haven’t saved any of this; the dataset on the disk is still the same as it was. I’m not going to save it, and I don’t think you should save it, because we’re going to be using this dataset quite a bit in this course. Bye for now!

Here’s how to load a dataset into the Weka Explorer interface, and look around it to see what’s there. It’s a tiny “toy” dataset, but all these operations work equally well on large, real life, ones. You can also edit the dataset, and – if you like – change it.

This article is from the free online

Data Mining with Weka

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now