Wednesday, October 28, 2015

Apache Spark Hello World on Windows

This explains the coding, building and  running a simple scala program on windows. This should be used only for testing purpose.

Pre-requiste : Apache Spark 1.5.x, Scala 2.11 and sbt installed on your windows machine.

Create a simple scala file, SimpleApp.scala

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    System.setProperty("hadoop.home.dir", "C:\\winutil\\");
    val logFile = "C:/spark-1.5.1-bin-hadoop2.6/README.md" 
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

You need to set the System property specifying the path of hadoop home directory. This is required only on windows. You can download the winutils.exe file from the following link

http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe

Create a sbt file, SimpleApp.sbt

name := "Simple App project"
version := "1.0"
scalaVersion :=  "2.11.5"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.1"

Your directory structure should be as follows

SimpleApp.sbt
src\main\scala\SimpleApp.scala

Building a deploying a package

sbt package
spark-submit --class "SimpleApp" --master local[4] target\scala-2.11\simple-app-project_2.11-1.0.jar



No comments:

Post a Comment