Learn more about this course.

Pruning decision trees

To prevent decision trees from overfitting the data, you need to prune them. Ian Witten demonstrates the difference between pruned and unpruned trees.

Decision trees run the risk of overfitting the training data. One simple counter-measure is to stop splitting when the nodes get small. Another is to construct a tree and then prune it back, starting at the leaves. For this, J48 uses a statistical test which is rather unprincipled but works well. For example, on the breast-cancer dataset it generates a tree with 4 leaves (6 nodes in total) that gets an accuracy of 75.5%. With pruning switched off, the tree has 152 leaves (179 nodes) whose accuracy is only 69.6%.

Want to keep learning?

This content is taken from The University of Waikato online course

Data Mining with Weka

View Course

See other articles from this course

This article is from the free online

Data Mining with Weka

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Pruning decision trees

Share this post

Want to keep learning?

Data Mining with Weka

Share this post

Data Mining with Weka

Data Mining with Weka

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.