The data mining challenge: An expert speaks
How might one come up with a solution like the one in the final question of the preceding Quiz? Here are some comments from Peter, our expert data miner, who wrote challenge.py.
I work on spectral data of soil samples (remember the last lesson of Week 1?) for a living, which has given me extensive experience in this area – and, of course, I chose this challenge! I looked at the rules on the IDRC 2014 Shootout home page and discovered that the dataset has been collected from round the globe, which suggests that you want to build local models from closely related data.
Therefore I used a locally weighted classifier, LWL. Its default learning method is the decision stump, which is a very basic classifier – useless! In my experience Gaussian processes using the RBF kernel are usually quite good for spectral data. The only problem is that LWL is memory hungry, which is why I chose a smallish neighborhood of 150 instances. But – hey – you might be able to do better!