In this article we will demonstrate, how to briefly perform basic analysis of textual data. This includes: (i) reading the text files, (ii) cleaning the text, (iii) analyzing the text, …
We need to quickly check whether everything is working in our virtual machine and to become familiar with the environment we will be working with. Open the terminal by clicking …
We will be working with a custom installed Hadoop in Linux Mint operating system, which is based on Ubuntu Linux platform, which is further based on Debian Linux distribution. In …
Code related to examples in the video All code in the text below pertains to R unless stated otherwise. Synthetic data We create the synthetic data frame called Data with …
Introduction In this article we present the basic matrix operations using R with a particular focus on those operations that have the potential for parallelisation using map-reduce. Remark Note that …
This is actually a motivating video. All learners that will demonstrate activities in weeks 1-3 will get in week 4 possibility to login the HPC provided by University of Ljubljana. …
We will be working with a custom-installed Hadoop in the Linux Mint operating system, which is based on the Ubuntu Linux platform, which is further based on the Debian Linux …
Introduction By now, we assume that you have a user account to run this example in RStudio on the HPC cluster of the University of Ljubljana. Load the following libraries: …
In this video we explain what is supervised and what unsupervised learning. We present few demonstrative examples and list classical methods from both families: regression, classification and clustering.
Data We consider the customer data CEnetBig , which is already stored in the HDFS and contains data about the monthly bills of 1 million customers for 2017 and about …
Data We consider again the customer data CEnetBig data, which is already stored in the HDFS. You can load it in R using: CEnetBig=from.dfs("/CEnetBig") Its data format is as follows: …
Computing Centroids of Big Data In this example we demonstrate how to compute groups centroids using mapreduce from rmr2. We consider the data about customers of CEnet, stored in dfs …
So far you have managed to run Hadoop and R, connect R with Hadoop via RHadoop and learned about the R libraries that are used for working with the map …