Learn more about this course.

RHadoop libraries rhdfs, rmr2 and plyrmr

Here we explain the basic commands for working with RHadoop, which are provided in libraries rhdfs, rmr2 and plyrmr.

There are several RHadoop packages that we have to use in order to be able to connect R and Hadoop. Here we give a dense presentation of rhdfs, rmr2 and plyrmr together with available commands. Note that within this MOOC we demonstrate only few of those commands.

`rhdfs`

The library package rhdfs provides commands for file manipulation in terms of reading, writing and moving files. Namely, R is a programming language that offers data processing. The data themselves are stored in files in the Hadoop filesystem. The following commands are part of the rhdfs package:

File Manipulations

Want to keep
learning?

This content is taken from
Partnership for Advanced Computing in Europe (PRACE) online course,

Managing Big Data with R and Hadoop

View Course

hdfs.copy, hdfs.move, hdfs.rename, hdfs.delete, hdfs.rm, hdfs.del, hdfs.chown, hdfs.put, hdfs.get

File Read/Write

hdfs.file, hdfs.write, hdfs.close, hdfs.flush, hdfs.read, hdfs.seek, hdfs.tell, hdfs.line.reader, hdfs.read.text.file

`rmr2`

The rmr2 package allows the Hadoop MapReduce facility to be used inside the R environment. The package documentation lists the following commands:

The big-data object

big.data.object

Backend-independent file manipulation

dfs.empty

Equijoins using map-reduce

equijoin

Read or write `R` objects from or to the file system

from.dfs

Important Hadoop settings in relation to rmr2

hadoop.settings

Create, project or concatenate key-value pairs

keyval

Create combinations of settings for flexible IO

make.input.format

MapReduce using Hadoop Streaming

mapreduce

Function to set and get package options

rmr.options

Sample large data sets

rmr.sample

Print a variable’s content

rmr.str

Functions to split a file over several parts or to merge multiple parts into one

scatter

Set the status and define and increment counters for a Hadoop job

status

Create map-and-reduce functions from other functions

to.map

`plyrmr`

This package aims to provide a wide palette of predefined operations to cover the basic data-manipulation needs. The documentation of the plyrmr package provides the following list of available commands:

data manipulation

bind.cols (add new columns), where(select rows), select(select columns), rbind, transmute (all of the above plus summaries)

summaries

transmute, sample, count.cols, quantile.cols, top.k, bottom.k

set operations

union, intersect, unique, merge

transmute appears twice because it is a generalization over transform and summarize that allows us to increase or decrease the number of columns or rows, covering the need for multi-row summaries, flattening of data structures, etc.

Want to keep learning?

This content is taken from Partnership for Advanced Computing in Europe (PRACE) online course

Managing Big Data with R and Hadoop

View Course

See other articles from this course

This article is from the free online

Managing Big Data with R and Hadoop

Created by

Join Now

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now

Learn more about this course.

RHadoop libraries rhdfs, rmr2 and plyrmr

`rhdfs`

Want to keep
learning?

Managing Big Data with R and Hadoop

`rmr2`

`plyrmr`

Want to keep learning?

Managing Big Data with R and Hadoop

Managing Big Data with R and Hadoop

Managing Big Data with R and Hadoop

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

Learn more about this course.

RHadoop libraries rhdfs, rmr2 and plyrmr

Share this step

rhdfs

Want to keep learning?

Managing Big Data with R and Hadoop

rmr2

plyrmr

Want to keep learning?

Managing Big Data with R and Hadoop

Share this

Managing Big Data with R and Hadoop

Managing Big Data with R and Hadoop

Reach your personal and professional goals

Register to receive updates

Learn more about this course.

Learn more about this course.

See all FutureLearn courses.

`rhdfs`

Want to keep
learning?

`rmr2`

`plyrmr`