Skip main navigation

£199.99 £139.99 for one year of Unlimited learning. Offer ends on 28 February 2023 at 23:59 (UTC). T&Cs apply

Find out more


There’s no magic in data mining! – in fact, perhaps Weka makes things too easy, says Ian Witten. You’ve learned lots, but we’ve missed out plenty.
One of the main points I’ve been trying to convey is that there’s no magic in data mining. There’s a huge array of alternative techniques, and they’re all fairly straightforward algorithms. We’ve seen the principles of many of them. Perhaps we don’t understand the details, but we’ve got the basic idea of the main methods of machine learning used in data mining. And there is no single, universal best method. Data mining is an experimental science. You need to find out what works best on your problem. Weka makes it easy for you. Using Weka you can try out different methods, you can try out different filters, different learning methods. You can play around with different datasets. It’s very easy to do experiments in Weka.
Perhaps you might say it’s too easy, because it’s important to understand what you’re doing, not just blindly click around and look at the results. That’s what I’ve tried to emphasize in this course – understanding and evaluating what you’re doing. There are many pitfalls you can fall into if you don’t really understand what’s going on behind the scenes. It’s not a matter of just blindly applying the tools in the workbench. We’ve stressed in the course the focus on evaluation, evaluating what you’re doing, and the significance of the results of the evaluation. Different algorithms differ in performance, as we’ve seen. In many problems, it’s not a big deal.
The differences between the algorithms are really not very important in many situations, and you should perhaps be spending more time on looking at the features and how the problem is described and the operational context that you’re working in, rather than stressing about getting the absolute best algorithm. It might not make all that much difference in practice. Use your time wisely. There’s a lot of stuff that we’ve missed out. I’m really sorry I haven’t been able to cover more of this stuff. There’s a whole technology of filtered classifiers, where you want to filter the training data, but not the test data.
That’s especially true when you’ve got a supervised filter, where the results of the filter depend on the class values of the training instances. You want to filter the training data, but not the test data, or maybe take a filter designed for the training data and apply the same filter to the test data without re-optimizing it for the test data, which would be cheating. You often want to do this during cross-validation. The trouble in Weka is that you can’t get hold of those cross-validation folds; it’s all done internally. Filtered classifiers are a simple way of dealing with this problem. We haven’t talked about costs of different decisions and different kinds of errors, but in real life different errors have different costs.
We’ve talked about optimizing the error rate, or the classification accuracy, but really, in most situations, we should be talking about costs, not raw accuracy figures, and these are different things. There’s a whole panel in the Weka Explorer for attribute selection, which helps you select a subset of attributes to use when learning, and in many situations it’s really valuable, before you do any learning, to select an appropriate small subset of attributes to use. There are a lot of clustering techniques in Weka.
Clustering is where you want to learn something even when there is no class value: you want to cluster the instances according to their attribute values. Association rules are another kind of learning technique where we’re looking for associations between attributes. There’s no particular class, but we’re looking for any strong associations between any of the attributes. Again, that’s another panel in the Explorer. Text classification. There are some fantastic text filters in Weka which allow you to handle textual data as words, or as characters, or n-grams (sequences of three, four, or five consecutive characters). You can do text mining using Weka. Finally, we’ve focused exclusively on the Weka Explorer, but the Weka Experimenter is also worth getting to know.
We’ve done a fair amount of rather boring, tedious, calculations of means and standard deviations manually by changing the random-number seed and running things again. That’s very tedious to do by hand. The Experimenter makes it very easy to do this automatically. So, there’s a lot more to learn. Let me just finish off here with a final thought. We’ve been talking about data, data mining. Data is recorded facts, a change of state in the world, perhaps.
That’s the input to our data mining process, and the output is information, the patterns – the expectations – that underlie that data: patterns that can be used for prediction in useful applications in the real world. We’ve going from data to information. Moving up in the world of people, not computers, “knowledge” is the accumulation of your entire set of expectations, all the information that you have and how it works together – a large store of expectations and the different situations where they apply. Finally, I’d like to define “wisdom” as the value attached to knowledge. I’d like to encourage you to be wise when using data mining technology. You’ve learned a lot in this course.
You’ve got a lot of power now that you can use to analyze your own datasets. Use this technology wisely for the good of the world. That’s my final thought for you.

There’s no magic in data mining! In fact, perhaps Weka makes things too easy. It is important to understand, and evaluate, what you’re doing, not just click around looking for good results. You’ve learned lots, but we’ve missed out plenty. Finally, I’d like to encourage you to be wise when using data mining technology. You’ve gained the power to analyze your own datasets. Use this technology wisely, for the good of the world.

This article is from the free online

Data Mining with Weka

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education