Sorry, this course is not currently running. Browse other IT & Computer Science courses.

The CPD Certification Service was established in 1996 and is the leading independent CPD accreditation institution operating across industry sectors to complement the CPD policies of professional and academic bodies. Find out more.

Other courses you might like

This course isn't running right now. We can email you when it starts again, or check out these other courses you might like.

University of Leeds

Anatomy: Know Your Abdomen

4.7 (264 reviews)

2 weeks

2 hrs per week

Included in Unlimited

Find out more

University of Leeds

Chemical Engineering: Shaping a Sustainable Future

4.1 (7 reviews)

2 weeks

2 hrs per week

Included in Unlimited

Find out more

University of Leeds

Ecology and Wildlife Conservation

4.6 (291 reviews)

2 weeks

2 hrs per week

Included in Unlimited

Find out more

Browse more in IT & Computer Science and Science, Engineering & Maths

Coding & Programming · Data Science

View all courses

Find the right course for you

See your personalised recommendations based on your interests and goals.

Get started

Learn how to mine data using Weka, with the University of Waikato

On this five-week course, you’ll discover how to mine data using the Weka workbench, a powerful tool for machine learning and data mining.

Guided by experts at the University of Waikato, the original developers of Weko, you’ll learn the basics of data visualisation, classification algorithms, and data interpretation and evaluation.

Explore the basics of data interpretation and evaluation

Beginning with an introduction to data mining concepts, you’ll discover the various applications of data mining in personal and professional contexts.

You’ll examine how to evaluate a classifier’s performance and use training, testing, and cross-validation to gauge the accuracy of the data you’ve gathered.

With these skills, you’ll be able to improve the quality of your data and develop meaningful answers to the questions you’re trying to answer.

Organise your data using classifiers

Exploring both simple and more complex classifiers, you’ll learn how different classification methods can be used to interpret datasets.

You’ll investigate the applications of concepts including decision trees, linear regression, and support vector machines, learning how to apply the correct classification method to your problem.

Examine the full data mining process

In the final week of this course, you’ll put your learning into context by exploring the full data mining process.

You’ll address common pitfalls and challenges to accessing data, as well as assessing the ethics of data mining, giving you a broader understanding of how and when data mining should be used in different contexts.

You’ll finish this course understanding what Weka is and how to gather and interpret big data. You’ll be aware of the full data mining process and be able to explain and apply Weka within your own data mining work.

Unable to play video. Please enable JavaScript or consider upgrading your browser.

Download video: standard or HD

Skip to 0 minutes and 4 seconds Hello! My name’s Ian Witten, I’m from the University of Waikato here in New Zealand, and I want to tell you about our new, free, online course – Data Mining with Weka. We’re overwhelmed by data in the world today. Every time we check out an item at the supermarket, every time we swipe our credit card, every time we send an email, every time we type a keystroke on our computer, every time we make a phone call, send a text, walk past a security camera – we all generate a little bit of data.

Skip to 0 minutes and 35 seconds Data mining is about taking this raw data, and transforming it into something more useful: information, perhaps; or predictions, predictions about what might happen next, predictions that can be used in the real world. The real aim of this course is to take the mystery out of data mining, to give you some practical experience actually using the Weka toolkit to do some mining on the data sets that we provide, to set you up so that, later on, you can use Weka to work on your own data sets and do your own data mining. It doesn’t involve any programming or anything like that. You’re going to be using the tools that we provide, the Weka tools.

Skip to 1 minute and 13 seconds It might help to know a little bit of elementary statistics, like means, variances, standard deviations, and so on. You might see a couple of mathematical formulae, but I’ll explain those, so don’t worry about that. You don’t really need any specific mathematical background. So that’s it – Data Mining with Weka, coming soon to a computer near you. I’m looking forward to it, and I hope to see you there. Bye for now!

Syllabus

Week 1
A little bit of everything
- What's data mining? What's Weka? What's the course about?
  Everybody talks about data mining and "big data" nowadays. This course introduces you to practical data mining. Weka is a powerful yet easy to use tool for machine learning and data mining that can also tackle large problems.
- What's it like to do data mining?
  Each week we’ll focus on a “Big Question” relating to data mining. This is the question for this week.
- Exploring the Explorer
  In this activity you will install Weka on your computer, start the Weka Explorer, and load, view, and edit datasets. (Note: You may need admin access to install Weka.)
- Exploring datasets
  Classification (also called "supervised learning") is a common kind of data mining problem. Datasets contain "instances," which are described in terms of a fixed set of features, or "attributes".
- Building a classifier
  Now you will learn how to use Weka's popular J48 classifier, which builds decision trees. J48 is a reimplementation of a classic classifier algorithm called C4.5.
- Using a filter
  WEKA contains "filters" that help with cleaning and preparing your data. Some filters operate on attributes; others operate on instances.
- Visualizing your data
  With Weka's visualization tool you can clean the data and remove anomalous instances (outliers). You can also visualize the errors that classifiers make.
Week 2
Evaluation
- How do I evaluate a classifier’s performance?
  This week's Big Question!
- Be a classifier!
  WEKA incorporates many different classification algorithms. One, called the “UserClassifier,” enables you to build your own decision tree for classification. How well can you do? It’s a challenge!
- Training and testing
  Evaluating what has been learned is an essential part of data mining. You should never evaluate on the training set! – the results will be overly optimistic. If you have a single dataset, hold some data back for testing.
- Repeated training and testing
  Ideally, training and test sets are sampled independently from a large population. Different samples give slightly different performance estimates. More reliable results are obtained by averaging over several experimental runs.
- Baseline accuracy
  How do you know how well your machine learning method is doing? You should always compare it with the “baseline accuracy” obtained by simple methods. ZeroR is an extremely simple method that serves as a useful baseline.
- Cross-validation
  Cross-validation, a standard evaluation technique, is a systematic way of running repeated percentage splits. In “stratified” cross-validation, training and test sets have the same class distribution as the full dataset.
- Cross-validation results
  Cross-validation is better than randomly repeating percentage split evaluations. It gives a more reliable performance estimate – that is, one with lower variance. Ten-fold cross-validation is a standard evaluation method.
- How are you getting on?
  We're well into the course now. Let's just take stock.
Week 3
Simple classifiers
- How do simple classification methods work?
  This week's Big Question!
- Simplicity first
  Always try simple methods before complex ones! (A good maxim for life in general, not just data mining.) Sometimes, simple algorithms perform really well. We learn about OneR, a simple method that is sometimes quite effective.
- Overfitting
  “Overfitting” is a general problem that plagues all machine learning methods. It’s when a classifier fits the training data too tightly. The classifier works well on the training data but not on independent test data.
- Using probabilities
  Why not use all attributes, equally weighted, instead of a single one as OneR does. Bayes' Theorem provides a sound probabilistic foundation for this. "Naive" Bayes assumes that attributes are equally important, and independent.
- Decision trees
  Decision trees are another simple classification method, based on a top-down, recursive, divide-and-conquer strategy. J48 (aka C4.5) finds a good attribute to split on at each stage using a measure called "information gain."
- Pruning decision trees
  Decision trees can easily overfit the training data, and pruning techniques are needed to guard against overfitting. There are various different methods. Unfortunately, this is where elegant algorithms get messy!
- Nearest neighbor
  How about storing the training instances and giving new instances the same classification as their nearest neighbor? A similarity function is needed to select the closest instance. Using several neighbors can improve performance.
Week 4
More classifiers
- What about real-life classification methods?
  This week's Big Question!
- Classification boundaries
  Different classifiers are biased towards different kinds of decision, which you can explore by visualizing the classification boundaries. We look at classification boundaries for OneR, IBk, NaiveBayes, and J48.
- Linear regression
  "Regression" problems are where the class is numeric, and "linear regression" is a standard mathematical technique for predicting numeric classes. In addition, there are non-linear methods that build trees of linear models.
- Classification by regression
  Linear regression can be used for classification as well. For two-valued nominal classes, just convert them to 0 and 1. For more class labels, either "multi-response linear regression" or "pairwise linear regression" can be used.
- Logistic regression
  Sometimes it’s best to predict class probabilities instead of predicting the classes themselves. Linear regression can be made to work with probabilities, resulting in logistic regression, a popular classification technique.
- Support vector machines
  Support vector machines separate the classes using the "maximum margin hyperplane." This is defined by a few instances, called "support vectors," from each class. The boundary depends on a few points, which reduces overfitting.
- Ensemble learning
  Many of us dislike committees, but nevertheless they often make good, unbiased, decisions. Several machine learning methods use "committees" of different classifier algorithms: Bagging, Random forests, Boosting, and Stacking.
Week 5
Putting it all together
- What else is there to know?
  This week's Big Question!
- The data mining process
  Producing classifiers is just a small part of the overall data mining process – perhaps the easiest part! Other parts involve formulating the question, gathering data, cleaning it, defining new features, and deploying the result.
- Pitfalls and pratfalls
  Be skeptical, and particularly wary of overfitting. Missing values can signify various things; classifiers treat them differently. There’s no single "best learner"; all methods have biases. Data mining is an experimental science!
- Data mining and ethics
  It’s far harder to anonymize data than you think! The purpose of data mining is to discriminate, but some kinds of discrimination are unethical, and illegal. Data mining discovers correlations, but these do not imply causation.
- There's no magic in data mining
  There’s no magic in data mining! – in fact, perhaps Weka makes things too easy. You’ve learned lots, but don’t be smug: this course has missed out plenty. And you've learned a powerful technology: please use it wisely.
- Farewell
  It's time to say goodbye.

Who is this accredited by?

The CPD Certification Service:

Learning on this course

On every step of the course you can meet other learners, share your ideas and join in with active discussions in the comments.

What will you achieve?

By the end of the course, you‘ll be able to...

Demonstrate use of Weka for key data mining tasks
Evaluate the performance of a classifier on new, unseen, instances
Explain how data miners can unwittingly overestimate the performance of their system
Identify learning methods that are based on different flavors of simplicity
Apply many different learning methods to a dataset of your choice
Interpret the output produced by classification methods
Describe the principles behind many modern machine learning methods
Compare the decision boundaries produced by different classification algorithms
Debate ethical issues raised by mining personal data

Who is the course for?

This course is designed for anyone considering a career in data science or those currently working in the data sector wanting to further their knowledge of data mining software.

What software or tools do you need?

You will download the free Weka software during Week 1. It runs on any computer, under Windows, Linux, or Mac. It has been downloaded millions of times and is being used all around the world.

(Note: Depending on your computer and system version, you may need admin access to install Weka.)

Who will you learn with?

Ian Witten

I grew up in Ireland, studied at Cambridge, and taught computer science at the Universities of Essex in England and Calgary in Canada before moving to paradise (aka New Zealand) 25 years ago.

Who developed the course?

The University of Waikato

Sitting among the top 3% of universities world-wide, The University of Waikato prepares students to think critically and to show initiative in their learning.

Established
1964
Location
Waikato, New Zealand
World ranking
Top 380Source: QS World University Rankings 2021

Learning on FutureLearn

Your learning, your rules

Courses are split into weeks, activities, and steps to help you keep track of your learning
Learn through a mix of bite-sized videos, long- and short-form articles, audio, and practical activities
Stay motivated by using the Progress page to keep track of your step completion and assessment scores

Join a global classroom

Experience the power of social learning, and get inspired by an international network of learners
Share ideas with your peers and course educators on every step of the course
Join the conversation by reading, @ing, liking, bookmarking, and replying to comments from others

Map your progress

As you work through the course, use notifications and the Progress page to guide your learning
Whenever you’re ready, mark each step as complete, you’re in control
Complete 90% of course steps and all of the assessments to earn your certificate

Want to know more about learning on FutureLearn? Using FutureLearn

Learner reviews

Learner reviews cannot be loaded due to your cookie settings. Please and refresh the page to view this content.

Get a taste of this course

Find out what this course is like by previewing some of the course steps before you join:

1.6

article

What's it like to do data mining?

1.26

article

Index

5.6

video

Data mining and ethics

Do you know someone who'd love this course? Tell them about it...

You can use the hashtag #FLdatamining to talk about this course on social media.

Harnessing AI in Marketing and Communication

Introduction to Cyber Security

The Online Educator: People and Pedagogy

How to Succeed at: Interviews

Harnessing AI in Marketing and Communication

Introduction to Cyber Security

The Online Educator: People and Pedagogy

How to Succeed at: Interviews

Data Mining with Weka

Other courses you might like

Anatomy: Know Your Abdomen

Chemical Engineering: Shaping a Sustainable Future

Ecology and Wildlife Conservation

Find the right course for you

Learn how to mine data using Weka, with the University of Waikato

Explore the basics of data interpretation and evaluation

Organise your data using classifiers

Examine the full data mining process

Syllabus

Week 1

A little bit of everything

What's data mining? What's Weka? What's the course about?

What's it like to do data mining?

Exploring the Explorer

Exploring datasets

Building a classifier

Using a filter

Visualizing your data

Week 2

Evaluation

How do I evaluate a classifier’s performance?

Be a classifier!

Training and testing

Repeated training and testing

Baseline accuracy

Cross-validation

Cross-validation results

How are you getting on?

Week 3

Simple classifiers

How do simple classification methods work?

Simplicity first

Overfitting

Using probabilities

Decision trees

Pruning decision trees

Nearest neighbor

Week 4

More classifiers

What about real-life classification methods?

Classification boundaries

Linear regression

Classification by regression

Logistic regression

Support vector machines

Ensemble learning

Week 5

Putting it all together

What else is there to know?

The data mining process

Pitfalls and pratfalls

Data mining and ethics

There's no magic in data mining

Farewell

Who is this accredited by?

Learning on this course

What will you achieve?

Who is the course for?

What software or tools do you need?

Who will you learn with?

Ian Witten

Who developed the course?

The University of Waikato

Learning on FutureLearn

Your learning, your rules

Join a global classroom

Map your progress

Learner reviews

Get a taste of this course

What's it like to do data mining?