Skip main navigation
We use cookies to give you a better experience, if that’s ok you can close this message and carry on browsing. For more info read our cookies policy.
We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.

The data mining challenge: An expert speaks

How might one come up with a solution like the one in the final question of the preceding Quiz? Here are some comments from Peter, our expert data miner, who wrote challenge.py.

I work on spectral data of soil samples (remember the last lesson of Week 1?) for a living, which has given me extensive experience in this area – and, of course, I chose this challenge! I looked at the rules on the IDRC 2014 Shootout home page and discovered that the dataset has been collected from round the globe, which suggests that you want to build local models from closely related data.

Therefore I used a locally weighted classifier, LWL. Its default learning method is the decision stump, which is a very basic classifier – useless! In my experience Gaussian processes using the RBF kernel are usually quite good for spectral data. The only problem is that LWL is memory hungry, which is why I chose a smallish neighborhood of 150 instances. But – hey – you might be able to do better!

Share this article:

This article is from the free online course:

Advanced Data Mining with Weka

The University of Waikato

Get a taste of this course

Find out what this course is like by previewing some of the course steps before you join:

Contact FutureLearn for Support