Skip main navigation

Pruning decision trees

To prevent decision trees from overfitting the data, you need to prune them. Ian Witten demonstrates the difference between pruned and unpruned trees.

Decision trees run the risk of overfitting the training data. One simple counter-measure is to stop splitting when the nodes get small. Another is to construct a tree and then prune it back, starting at the leaves. For this, J48 uses a statistical test which is rather unprincipled but works well. For example, on the breast-cancer dataset it generates a tree with 4 leaves (6 nodes in total) that gets an accuracy of 75.5%. With pruning switched off, the tree has 152 leaves (179 nodes) whose accuracy is only 69.6%.

This article is from the free online

Data Mining with Weka

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now