Want to keep learning?

This content is taken from the The University of Waikato's online course, Advanced Data Mining with Weka. Join the course to learn more.

The data mining challenge: An expert speaks

How might one come up with a solution like the one in the final question of the preceding Quiz? Here are some comments from Peter, our expert data miner, who wrote challenge.py.

I work on spectral data of soil samples (remember the last lesson of Week 1?) for a living, which has given me extensive experience in this area – and, of course, I chose this challenge! I looked at the rules on the IDRC 2014 Shootout home page and discovered that the dataset has been collected from round the globe, which suggests that you want to build local models from closely related data.

Therefore I used a locally weighted classifier, LWL. Its default learning method is the decision stump, which is a very basic classifier – useless! In my experience Gaussian processes using the RBF kernel are usually quite good for spectral data. The only problem is that LWL is memory hungry, which is why I chose a smallish neighborhood of 150 instances. But – hey – you might be able to do better!

Share this article:

This article is from the free online course:

Advanced Data Mining with Weka

The University of Waikato

Get a taste of this course

Find out what this course is like by previewing some of the course steps before you join: