Learn more about this course.

Working with big data

Ian Witten shows how some classifiers can handle arbitrarily large datasets when invoked from the command line, because it works incrementally.

Some classifiers work incrementally – that is, they update their model as the training dataset comes in, in a single pass through the dataset. When invoked from the command line, these classifiers can handle arbitrarily large datasets. In contrast, the Explorer loads in the entire dataset to begin with irrespective of which classifier is used, so it is limited by the amount of computer memory available. Note that cross-validation cannot work incrementally; you need to be careful about how you do the evaluation, maybe using an explicit test file.

Want to keep learning?

This content is taken from The University of Waikato online course

More Data Mining with Weka

View Course

See other articles from this course

This article is from the free online

More Data Mining with Weka

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Working with big data

Share this post

Want to keep learning?

More Data Mining with Weka

Share this post

More Data Mining with Weka

More Data Mining with Weka

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.