Apache Spark and Scala Training Logo

Apache Spark and Scala Training

Live Online & Classroom Enterprise Training

Master Apache Spark, a fast, in-memory distributed collections framework written in the programming language Scala. This Spark & Scala course will enable candidates to gain an in depth knowledge of Scala's programming model. It also gives them exposure to near-to-real-time data analytics through hands-on examples in Spark and Scala.

Looking for a private batch ?

Key Features
  • Lifetime Access

  • CloudLabs

  • 24x7 Support

  • Real-time code analysis and feedback

  • 100% Money Back Guarantee

SpringPeople Logo

What is Spark Scala Training about?

Spark  Scala training module will equip candidates with the necessary skills to create applications in Spark with the implementation of Scala programming. Additionally, this training will also provide a clear comparison between Spark and Hadoop and cover techniques to increase candidates application performance and enable high-speed processing.

With the use of advanced cloud labs, this training will help candidates to gain seamless hands-on experience by enabling them to work on various use cases

What are the objectives of Spark Scala Training ?

After the completion of this Scala Spark course, candidates will be able to:

  • Describe Scala and its implementation
  • Apply Lazy values, Control Structures, Loops, Collection, etc.
  • Apply the concepts of Traits and OOPS in scala
  • Explain Functional programming in scala
  • Interpret Big Data challenges
  • Use spark to provide solution to these challenges
  • Install spark and implement spark operations on spark shell
  • Interpret what are RDDs in spark
  • Implement spark application on YARN (Hadoop)
  • Analyze Hive and Spark SQL Architecture
Available Training Modes

Live Online Training

18 Hours

Classroom Training


3 Days

Who is Spark Scala Training for?

  • Data Scientists
  • Analytics Professionals
  • Developers & Testers
  • Teams getting started on Apache Spark and Scala projects

What are the prerequisites for Spark Scala Training?

  • Prior Programming experience in Java or other languages required
  • Basic familiarity with Linux or Unix preferred 
  • Intermediate-level of Hadoop understanding is good to have

Course Outline

  • Introduction to Spark and Analysis
    • Why second generation frameworks?
    • Introduction to Spark
    • Scala shell
    • Spark Architecture
    • Spark on Cluster
    • Spark Core
    • SparkSQL
    • Spark Streaming
    • Cluster Managers
    • Spark Users
    • What is use of Spark
    • Spark Versions
    • Spark Storage Layers
    • Download Spark
  • Spark API on a Cluster
    • Why second generation frameworks?
    • The Driver
    • Executors
    • Execution components: jobs, tasks, stages
    • Spark Web UI
  • Cluster Manager
    • Standalone Cluster Manager
    • Hadoop YARN
    • Apache Mesos
    • Amazon EC2
    • Which Cluster Manager?
    • Spark-submit for deploying applications
    • Using MAVEN for JAVA SPARK application
    • Using SBT for A SCALA Application
  • DATALOADING (HDFS, Amazon s3)
    • Different file formats:
    • Text files
    • Json
    • Comma ,tab separated values
    • Object files
    • Sequence files
    • Input /output formats
    • SPARKSQL for Structured data
  • RDD'S
    • What is RDD
    • Why RDD?
    • RDD operations
    • Transformations
    • Actions
    • Lazy Evaluation
    • Basic RDD's
    • Caching
    • Converting between RDD types
    • Spark Api supports Python, Java, Scala
    • Working with Key, value pairs
    • Create key, value pair RDD's
  • Transformations on pair RDD's
    • Aggregations
    • Grouping data
    • Joins
    • Sorting data
  • Actions on pair RDD's
    • RDD's partitioner
    • Operations from partitioning
    • Page Rank example
  • Advanced Spark operation
    • Aggregate
    • Fold
    • Map partitions
    • Glom
    • Accumulators
    • Broadcasters
    • Anatomy of a spark RDD
    • Splits
    • Localization
    • Serialization
    • Transformations Vs. Actions
  • Spark SQL
    • Spark sql in applications
    • Spark sql initialization
    • Spark sql basic query
    • Schema RDD's
    • Caching
    • Load data from hive
    • Load data from json
    • Load data from RDD's
    • Beeline
    • Long-lived tables and queries
    • Query hands-on
    • Spark sql UDF's
    • Performance
  • Spark Streaming
    • Streaming Architecture
    • Two types of Transformations
    • 1. Stateless Transformations
    • 2. Stateful Transformations
    • Streaming UI
    • Sources: Input
    • Core Sources
    • Additional Sources
    • Multiple Sources
    • Cluster Sizing
  • Fault Tolerance
    • Driver Fault Tolerance
    • Worker Fault Tolerance
    • Receiver Fault Tolerance
    • Operation 24/7
    • Performance
    • Garbage collection
    • Memory Usage

Who is the instructor for this training?

This  Apache Spark and Scala certification is led by a subject-matter expert with  extensive experience in the domain. The trainer also has  years of experience training and mentoring professionals in Spark and Scala.