Skip main navigation

Text mining with R

In this article we will demonstrate, how to briefly perform basic analysis of textual data. This includes: (i) reading the text files, (ii) cleaning the text, (iii) analyzing the text, …

Starting Hadoop, RStudio and RHadoop

We need to quickly check whether everything is working in our virtual machine and to become familiar with the environment we will be working with. Open the terminal by clicking …

Installation of Linux virtual machine

We will be working with a custom installed Hadoop in Linux Mint operating system, which is based on Ubuntu Linux platform, which is further based on Debian Linux distribution. In …

First Big Data example with RHadoop

Code related to examples in the video All code in the text below pertains to R unless stated otherwise. Synthetic data We create the synthetic data frame called Data with …

Basic matrix operations in R

Introduction In this article we present the basic matrix operations using R with a particular focus on those operations that have the potential for parallelisation using map-reduce. Remark Note that …

RHadoop on real HPC

This is actually a motivating video. All learners that will demonstrate activities in weeks 1-3 will get in week 4 possibility to login the HPC provided by University of Ljubljana. …

Installation of a Linux Virtual Machine

We will be working with a custom-installed Hadoop in the Linux Mint operating system, which is based on the Ubuntu Linux platform, which is further based on the Debian Linux …

Big Electricity Energy data

Introduction By now, we assume that you have a user account to run this example in RStudio on the HPC cluster of the University of Ljubljana. Load the following libraries: …

Supervised vs. unsupervised learning

In this video we explain what is supervised and what unsupervised learning. We present few demonstrative examples and list classical methods from both families: regression, classification and clustering.

Finding the Extreme Data Points

Data We consider the customer data CEnetBig , which is already stored in the HDFS and contains data about the monthly bills of 1 million customers for 2017 and about …

Computing Groups Centroids

Computing Centroids of Big Data In this example we demonstrate how to compute groups centroids using mapreduce from rmr2. We consider the data about customers of CEnet, stored in dfs …