Skip main navigation

Simple neural networks

Neural network learning methods invite an analogy to the brain that is seductive but entirely misleading! The simplest form of neural network, called a “Perceptron”, implements a linear decision boundary. …

Cost-sensitive classification conclusions

Even if you don’t do the exercise, you should look at the numbers and note that they support the following general conclusions for the credit-g dataset with a particular cost …

Cost-sensitive classification

There are two different ways to make a classifier cost-sensitive. One is to create the classifier in the usual way, striving to minimize the number of errors rather than their …

Counting the cost

So far we’ve taken the classification rate – computed on a test set, or holdout, or cross-validation – as the measure of a classifier’s success. We’re trying to maximize the …

Attribute selection using ranking

The attribute selection methods we have examined so far strive to eliminate both irrelevant attributes and redundant ones. A simpler idea is to rank the effectiveness of each individual attribute, …

Scheme-independent selection

Attribute selection methods that do not involve a classifier can be faster than the wrapper method. They can use the same kind of searching, but evaluate each subset using a …

The Attribute Selected Classifier

Experimenting with a dataset to select attributes and applying a classifier to the result is cheating, if performance is evaluated using cross-validation, because the entire dataset is used to determine …

“Wrapper” attribute selection

Fewer attributes often yield better performance! In a laborious manual process, you can start with the full attribute set and remove the best attribute by selectively trying all possibilities, and …

Is it better to generate rules or trees?

We haven’t talked much about rules. We’ve spent a lot of time generating decision trees from datasets – the data mining method you’ve encountered most frequently so far is J48, …

Evaluating clusters

Different clustering algorithms use different metrics for optimization internally, which makes the results hard to evaluate and compare. Weka allows you to visualize clusters, so you can evaluate them by …

Here’s what I did

The top 10 rules involve total = high and predict bread-and-cake, supported by 723 transactions. They all have a consequent of “bread and cake” They all indicate a high total …

Representing clusters

With clustering, there’s no “class” attribute: we’re just trying to divide the instances into natural groups or “clusters”. There are different ways of representing clusters. Are they disjoint, or can …