Apache Spark and Scala Logo

Apache Spark and Scala Training

Live Online & Classroom Enterprise Training

Apache Spark is a fast, in-memory distributed collections framework written in Scala. In this Apache Spark & Scala course, you will understand Scala's programming model in detail apart from gaining exposure to near-to-real-time data analytics through hands-on examples in Spark and Scala.

Looking for a private batch ?

REQUEST A CALLBACK
Key Features
  • Lifetime Access

  • CloudLabs

  • 24x7 Support

  • Real-time code analysis and feedback

  • 100% Money Back Guarantee

PDP BG 1
SpringPeople Logo

What is Spark Scala training about?

Apache Spark and Scala training module teaches you to create applications in Spark with the implementation of Scala programming. It provides a clear comparison between Spark and Hadoop and covers techniques to increasing your application performance and enabling high-speed processing.

What are the objectives of Spark Scala training?

After the completion of 'Apache Spark & Scala' course, you will be able to:

  • Understand Scala and its implementation
  • Apply Lazy values, Control Structures, Loops, Collection, etc.
  • Learn the concepts of Traits and OOPS in scala
  • Understand Functional programming in scala
  • Get an insight into the BigData challenges
  • How spark acts as a solution to these challenges
  • Install spark and implement spark operations on spark shell
  • Understand what are RDDs in spark
  • Implement spark application on YARN (Hadoop)
  • Analyze Hive and Spark SQL Architecture
Available Training Modes

Live Online Training

18 Hours

Classroom Training

 

3 Days
PDP BG 2

Who is Spark Scala training for?

  • Anyone who wants to add Apache Spark and Scala skills to their profile
  • Teams getting started on Apache Spark and Scala projects
  • What are the prerequisites for Spark Scala training?

    • Basic familiarity with Linux or Unix
    • Intermediate-level of Hadoop

    Course Outline

    • Module-1: Introduction to Spark and Analysis
      • Why second generation frameworks?
      • Introduction to Spark
      • Scala shell
      • Spark Architecture
      • Spark on Cluster
      • Spark Core
      • SparkSQL
      • Spark Streaming
      • Cluster Managers
      • Spark Users
      • What is use of Spark
      • Spark Versions
      • Spark Storage Layers
      • Download Spark
    • 1. Spark API on a Cluster
      • A. Why second generation frameworks?
      • B .The Driver
      • C. Executors
      • D. Execution components: jobs, tasks, stages
      • E. Spark Web UI
    • 2. Cluster Manager
      • A. Standalone Cluster Manager
      • B .Hadoop YARN
      • C. Apache Mesos
      • D. Amazon EC2
      • E. Which Cluster Manager?
      • F. Spark-submit for deploying applications
      • G. Using MAVEN for JAVA SPARK application
      • H. Using SBT for A SCALA Application
    • Module-2: DATALOADING (HDFS, Amazon s3)
      • Different file formats:
      • Text files
      • Json
      • Comma ,tab separated values
      • Object files
      • Sequence files
      • Input /output formats
      • SPARKSQL for Structured data
    • Module-3: RDD'S
      • What is RDD
      • Why RDD?
      • RDD operations
      • Transformations
      • Actions
      • Lazy Evaluation
      • Basic RDD's
      • Caching
      • Converting between RDD types
      • Spark Api supports Python, Java, Scala
      • Working with Key, value pairs
      • Create key, value pair RDD's
    • 1. Transformations on pair RDD's
      • Aggregations
      • Grouping data
      • Joins
      • Sorting data
    • 2. Actions on pair RDD's
      • RDD's partitioner
      • Operations from partitioning
      • Page Rank example
    • 3. Advanced Spark operation
      • Aggregate
      • Fold
      • Map partitions
      • Glom
      • Accumulators
      • Broadcasters
      • Anatomy of a spark RDD
      • Splits
      • Localization
      • Serialization
      • Transformations Vs. Actions
    • Module-4: SPARK SQL
      • Spark sql in applications
      • Spark sql initialization
      • Spark sql basic query
      • Schema RDD's
      • Caching
      • Load data from hive
      • Load data from json
      • Load data from RDD's
      • Beeline
      • Long-lived tables and queries
      • Query hands-on
      • Spark sql UDF's
      • Performance
    • Module-5: SPARK STREAMING
      • Streaming Architecture
      • Two types of Transformations

      •    1. Stateless Transformations
      •    2. Stateful Transformations
      • Streaming UI
      • Sources: Input
      • Core Sources
      • Additional Sources
      • Multiple Sources
      • Cluster Sizing
    • 1. Fault Tolerance
      • Driver Fault Tolerance
      • Worker Fault Tolerance
      • Receiver Fault Tolerance
      • Operation 24/7
      • Performance
      • Garbage collection
      • Memory Usage

    Who is the instructor for this training?

    The trainer for this Apache Spark and Scala has extensive experience in this domain, including years of experience training & mentoring professionals.

    Reviews