SparkR (R on Spark). It is a functionality to access power of Apache Spark from R.
One can use SparkR in some such scenarios:
Pre-requisite
Install R and R Studio
Install Apache Spark http://spark.apache.org/downloads.html
Set SPARK_HOME and PATH
Open R Studio and follow the following steps:
Install JavaR
library("JavaR") - In case of any error, just close the R Studio and open it again. It happened in my case
library(SparkR, lib.loc="C:\\spark-1.4.0-bin-hadoop2.6\\R\\lib")
Simple Test
You can either take the code from http://spark.apache.org/docs/latest/sparkr.html or use the following
sc <- sparkR.init()
sqlContext <- sparkRSQL.init(sc)
df <- createDataFrame(sqlContext, faithful) // faithful is shipped with R
head(df)
One can use SparkR in some such scenarios:
- Dataset does not fit into memory.
- Need to use distributed machine learning algorithms from R.
Pre-requisite
Install R and R Studio
Install Apache Spark http://spark.apache.org/downloads.html
Set SPARK_HOME and PATH
Open R Studio and follow the following steps:
Install JavaR
library("JavaR") - In case of any error, just close the R Studio and open it again. It happened in my case
library(SparkR, lib.loc="C:\\spark-1.4.0-bin-hadoop2.6\\R\\lib")
Simple Test
You can either take the code from http://spark.apache.org/docs/latest/sparkr.html or use the following
sc <- sparkR.init()
sqlContext <- sparkRSQL.init(sc)
df <- createDataFrame(sqlContext, faithful) // faithful is shipped with R
head(df)