Skip main navigation

The data mining challenge: An expert speaks

Peter Reutemann explains how he came up with the solution in the final question of the previous Quiz.

How might one come up with a solution like the one in the final question of the preceding Quiz? Here are some comments from Peter, our expert data miner, who wrote challenge.py.

I work on spectral data of soil samples (remember the last lesson of Week 1?) for a living, which has given me extensive experience in this area – and, of course, I chose this challenge! I looked at the rules on the IDRC 2014 Shootout home page and discovered that the dataset has been collected from round the globe, which suggests that you want to build local models from closely related data.

Therefore I used a locally weighted classifier, LWL. Its default learning method is the decision stump, which is a very basic classifier – useless! In my experience Gaussian processes using the RBF kernel are usually quite good for spectral data. The only problem is that LWL is memory hungry, which is why I chose a smallish neighborhood of 150 instances. But – hey – you might be able to do better!

This article is from the free online

Advanced Data Mining with Weka

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now