Learn more about this course.

Miscellaneous Distributed Weka capabilities

Mark Hall shows how to compute a correlation matrix in Distributed Weka for input to Principal Component Analysis, and a parallel version of k-means.

There are other useful KnowledgeFlow templates for Distributed Weka. One computes a correlation matrix for input to Principal Component Analysis; another runs a parallel version of the k-means clustering algorithm. To process large datasets you need to run Distributed Weka on a cluster. The Apache Spark website contains information on how to set up a cluster; this blog post explains how to run a Spark cluster on a single machine using separate Java processes that communicate as though they were running on different machines – which is different from the “local mode” we’ve been using, where the entirety of Spark runs in a single Java process.

Want to keep learning?

This content is taken from The University of Waikato online course

Advanced Data Mining with Weka

View Course

See other articles from this course

This article is from the free online

Advanced Data Mining with Weka

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

Miscellaneous Distributed Weka capabilities

Share this post

Want to keep learning?

Advanced Data Mining with Weka

Share this post

Advanced Data Mining with Weka

Advanced Data Mining with Weka

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.