Learn more about this course.

Installing with Apache Spark

Mark Hall explains how to interact with Distributed Weka through the KnowledgeFlow environment.

Having installed Distributed Weka, you can interact with it in the KnowledgeFlow environment. New components such as ArffHeaderSparkJob, WekaClassifierSparkJob, and WekaClassifierEvaluationSparkJob become available. In addition, example knowledge flows are provided as templates that operate “out of the box” using all the CPU’s cores as processing nodes – without having to install and configure a Spark cluster. Distributed Weka operates on header-less CSV files, because it splits data into blocks to enable distributed storage of large datasets and allow data-local processing, and it would be inconvenient to replicate the ARFF header in each block. Instead, the ArffHeaderSparkJob creates a separate header that contains a great deal of information that would otherwise have to be recomputed by each processing node.

Want to keep learning?

This content is taken from The University of Waikato online course

Advanced Data Mining with Weka

View Course

See other articles from this course

This article is from the free online

Advanced Data Mining with Weka

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Installing with Apache Spark

Share this post

Want to keep learning?

Advanced Data Mining with Weka

Share this post

Advanced Data Mining with Weka

Advanced Data Mining with Weka

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.