What if there's no "class" attribute?
This course so far – like it’s predecessor, Data Mining with Weka – has focused solely on tasks whose aim is to predict the value of a particular attribute called the “class”.
When the class is nominal, this is called classification; when it’s numeric, it’s called regression (why? – don’t ask, it’s kinda crazy). Both tasks are sometimes called “supervised” learning, the idea being that there is a supervisor (or teacher) who dictates what the correct class should be.
But what can you do with a dataset that has no class value? One idea is to seek associations between any of the attributes, or between any set of attributes. Associations are invariably expressed as rules, and this is called “association rule mining”. Another idea is to see if the instances fall into natural groups, a task known as “clustering”. In both cases it’s pretty hard to evaluate the result in objective ways – unlike classification, where the gold standard is to predict the class correctly on fresh data.
This week we’ll examine both of these tasks. By the end you will be able to apply association rule mining to a dataset and seek interesting associations. For any rule you’ll be able to calculate the key parameters of support and confidence. And you’ll have experienced some of the limitations of association rule mining and how difficult it can be to find interesting patterns in data.
You’ll also be experienced in using different clustering methods, and will have learned to be sceptical if the results look too good! And you’ll be able to evaluate clusterings using the classification-by-clustering method.