Installing with Apache Spark
Share this post
Having installed Distributed Weka, you can interact with it in the KnowledgeFlow environment. New components such as ArffHeaderSparkJob, WekaClassifierSparkJob, and WekaClassifierEvaluationSparkJob become available. In addition, example knowledge flows are provided as templates that operate “out of the box” using all the CPU’s cores as processing nodes – without having to install and configure a Spark cluster. Distributed Weka operates on header-less CSV files, because it splits data into blocks to enable distributed storage of large datasets and allow data-local processing, and it would be inconvenient to replicate the ARFF header in each block. Instead, the ArffHeaderSparkJob creates a separate header that contains a great deal of information that would otherwise have to be recomputed by each processing node.
Share this post
Our purpose is to transform access to education.
We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.
We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.
Learn more about how FutureLearn is transforming access to education
Register to receive updates
-
Create an account to receive our newsletter, course recommendations and promotions.
Register for free