Skip main navigation

Random Forest performance

Ian Witten discusses how the performance of the Random Forest algorithm changes with the number of trees.

The performance of Random Forest does tend to improve with more trees, but only up to a point.

We found in the preceding Quiz that performance increases from 100 to 200 trees, but stays the same for 300 and deteriorates for 400 and 500 trees. (However, the difference is probably not statistically significant in this small example.)

The amount of data in each partition tends to be the limiting factor, and, as we have seen, this can be improved by reducing the number of partitions. However, in a practical “big data” problem this is unlikely to be an issue.

For big data, set the number of partitions to match the available hardware – the number of nodes/cores in the cluster, along with the amount of memory available to each. Configure Weka so that each partition contains as much data as possible, consistent with it fitting into the available memory.

This article is from the free online

Advanced Data Mining with Weka

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education