Skip main navigation

£199.99 £139.99 for one year of Unlimited learning. Offer ends on 28 February 2023 at 23:59 (UTC). T&Cs apply

Find out more

What’s data mining? What’s Weka?

Ian Witten explains what data mining is and what Weka is. After completing this course you'll be able to mine your own data!
Hi! Welcome to the course Data Mining with Weka. I’m Ian Witten from the University of Waikato in New Zealand and I’m presenting the videos for this course which is being prepared by the Department of Computer Science at the University of Waikato. Data mining is a mature technology that a lot of people are beginning to take very seriously, and a lot of other people find it mysterious. The real aim of this course is to take the mystery out of data mining. This is a practical course on how to use the Weka workbench, which you will download as part of the course, for data mining.
We explain the basic principles of several popular data mining algorithms and how to use them in practical applications. In the world today, we’re overwhelmed with data. Every time you swipe your credit card, every item you checkout out at the supermarkets, every time you send a text, make a phone call, or send an email, or type a key on a computer, even every time you walk past a security camera – it all generates a little bit of data in a database. Data mining is about going from the raw data to information, information that can be used to make predictions, predictions that are useful in the real world. Let me give you an example. You’re at the supermarket checkout.
The till records every item you bought. At the end, you hand over your loyalty card, and they give you a couple of percent off, and you give them your name and address, and, indirectly, access to all sorts of demographic information about you and people like you. Everybody likes a good bargain. It’s been a good day today, because, thanks to those coupons they sent you in the mail last week, you’ve been able to stock up on some things you wouldn’t normally have bought, but you bought today because they’re such a good deal. Next week they’ll send you some more coupons, and you’ll go shopping again and buy some more stuff.
They do little experiments on you, you know, they try to figure out how much more you would buy if the price was just that little bit less. These coupons are a mechanism for personalized pricing. They’ve got access to all sorts of data from you, and people like you, in order to do these experiments and figure these things out.
Everybody wins: you get your bargains; they sell more stuff. It sounds like a good deal to me. Here’s another application. Suppose you and your partner want a child, but you can’t have one. It’s fun trying, but it can get a little bit frustrating, and, ultimately, very frustrating, perhaps even tragic. In artificial insemination, they take some eggs from the woman’s ovaries, and they fertilize them with partner or donor sperm, and then they select from amongst the embryos that are produced some to implant back into the womb. You want to select the ones with the best chance of success of producing a live birth, but you don’t want too many live births.
The embryologist has access to all sorts of data on these embryos. I think there are 50–100 pieces of information that they record about individual embryos, and they have historical data on which ones produced a live birth – a success. So here’s an ideal situation for data mining. We have lots of historical data; we have data on the present situation; and we want to select those embryos that have the best chance of success. Now, that’s a good application for data mining, bringing a live child to a couple who wants one. I talk about “data mining” and “machine learning”. Data mining is the application, and machine learning is the algorithms we use.
We’re talking about using machine learning algorithms for the purposes of data mining. The next question – this is Data Mining with Weka – “What’s Weka?” This is a weka here, this little bird. It’s a flightless bird, like its better known cousin the kiwi, found only in the islands of New Zealand. This is what it sounds like, coming to you from New Zealand. However, in our context, Weka is a data mining workbench. It’s an acronym for the Waikato Environment for Knowledge Analysis. We just call it Weka. It contains a large number of algorithms for classification, and a lot of algorithms for data preprocessing, feature selection, clustering, finding association rules, things like that.
It’s a very comprehensive workbench, and it’s free open source software that you will download as part of this course in the next lesson. It runs on any computer. It’s written in Java, and runs on Linux, Windows, Mac. You’ll be able to download it and run it on your workstation and use it during the course. You’re going to learn how to load data into Weka and look at it. You’re going to learn about preprocessing, cleaning up data using filters, exploring it using visualizations, applying classification algorithms, interpreting the output, understanding evaluation methods – evaluation is very important in this area – understand various representations for models, how popular machine learning algorithms work, and be aware of common pitfalls with data mining.
The ultimate goal really is to empower you to use Weka on your own data, and, most importantly, to understand what it is you are doing. That’s it. I just thought I’d show you were I am. I’m in New Zealand, that’s where Weka is from. That’s where I’m sitting right now. This is the world as we see it in New Zealand. We’re at the top, you’re probably down at the bottom somewhere. We’re at the top, in the center, and that arrow to the North Island of New Zealand is where the University of Waikato is. I’ll see you again in the next lesson. I’m looking forward to that. Goodbye for now.

Everybody talks about data mining and “big data” nowadays. Example applications range from analyzing the contents of your supermarket basket in order to entice you to spend more in your next shopping expedition, to helping a couple conceive a baby by enhancing the chance of successful artificial insemination. Weka is a powerful yet easy-to-use tool for machine learning and data mining that you will soon download and experiment with. During this course you will learn how to load data, filter it to clean it up, explore it using visualizations, apply classification algorithms, interpret the output, and evaluate the result. You will also learn that New Zealand is at the top of the world, and you may be at the bottom!

This article is from the free online

Data Mining with Weka

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education