Skip main navigation

The data mining challenge: An expert speaks

Peter Reutemann explains how he came up with the solution in the final question of the previous Quiz.

How might one come up with a solution like the one in the final question of the preceding Quiz? Here are some comments from Peter, our expert data miner, who wrote challenge.py.

I work on spectral data of soil samples (remember the last lesson of Week 1?) for a living, which has given me extensive experience in this area – and, of course, I chose this challenge! I looked at the rules on the IDRC 2014 Shootout home page and discovered that the dataset has been collected from round the globe, which suggests that you want to build local models from closely related data.

Therefore I used a locally weighted classifier, LWL. Its default learning method is the decision stump, which is a very basic classifier – useless! In my experience Gaussian processes using the RBF kernel are usually quite good for spectral data. The only problem is that LWL is memory hungry, which is why I chose a smallish neighborhood of 150 instances. But – hey – you might be able to do better!

This article is from the free online

Advanced Data Mining with Weka

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education