Thursday, September 10, 2015

Setting up SparkR on Windows

SparkR  (R on Spark). It is a functionality to access power of Apache Spark from R. 

One can use SparkR in some such scenarios:

  • Dataset does not fit into memory.
  • Need to use distributed machine learning algorithms from R.

Pre-requisite

Install R and R Studio
Install Apache Spark http://spark.apache.org/downloads.html
Set SPARK_HOME and PATH

Open R Studio and follow the following steps:

Install JavaR
library("JavaR") - In case of any error, just close the R Studio and open it again. It happened in my case
library(SparkR, lib.loc="C:\\spark-1.4.0-bin-hadoop2.6\\R\\lib")

Simple Test

You can either take the code from http://spark.apache.org/docs/latest/sparkr.html or use the following


sc <- sparkR.init()
sqlContext <- sparkRSQL.init(sc)
df <- createDataFrame(sqlContext, faithful)  // faithful is shipped with R
head(df)


No comments:

Post a Comment