Learn more about this course.

Map tasks and Reduce tasks

Mark Hall explains how Map tasks produce models and a Reduce task aggregates them. Reduce strategies differ for Naive Bayes and other model types.

Map tasks produce models and a Reduce task aggregates them. Reduce strategies differ for Naive Bayes and other model types. We saw in the last lesson that Naive Bayes and JRip are treated differently. The reason is that Naive Bayes is easily parallelized by adding up frequency counts from the individual partitions, producing a single model. For JRip (and other classifiers), separate classifiers are learned for each partition (4 in this case), and a “vote” ensemble learner is produced that combines them. Also, for some classifiers (like JRip) it is beneficial to randomize the dataset before splitting it into partitions. Finally, we look at the “Spark: cross-validate two classifiers” template and examine how DIstributed Weka performs cross-validation.

Want to keep learning?

This content is taken from The University of Waikato online course

Advanced Data Mining with Weka

View Course

See other articles from this course

This article is from the free online

Advanced Data Mining with Weka

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Map tasks and Reduce tasks

Share this post

Want to keep learning?

Advanced Data Mining with Weka

Share this post

Advanced Data Mining with Weka

Advanced Data Mining with Weka

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.