Apache Spark and Scala Logo

Apache Spark and Scala Training

Live Online & Classroom Enterprise Training

Apache Spark is a fast, in-memory distributed collections framework written in Scala. In this Apache Spark & Scala course, you will understand Scala's programming model in detail apart from gaining exposure to near-to-real-time data analytics through hands-on examples in Spark and Scala.

Looking for a private batch ?

REQUEST A CALLBACK
Key Features
  • Lifetime Access

  • CloudLabs

  • 24x7 Support

  • Real-time code analysis and feedback

  • 100% Money Back Guarantee

PDP BG 1
SpringPeople Logo

What is Apache Spark and Scala training about?

Apache Spark and Scala training module teaches you to create applications in Spark with the implementation of Scala programming. It provides a clear comparison between Spark and Hadoop and covers techniques to increasing your application performance and enabling high-speed processing.

What are the objectives of Apache Spark and Scala training?

After the completion of 'Apache Spark & Scala' course, you will be able to:

  • Understand Scala and its implementation
  • Apply Lazy values, Control Structures, Loops, Collection, etc.
  • Learn the concepts of Traits and OOPS in scala
  • Understand Functional programming in scala
  • Get an insight into the BigData challenges
  • How spark acts as a solution to these challenges
  • Install spark and implement spark operations on spark shell
  • Understand what are RDDs in spark
  • Implement spark application on YARN (Hadoop)
  • Analyze Hive and Spark SQL Architecture
Available Training Modes

Live Online Training

18 Hours

Classroom Training

 

3 Days
PDP BG 2

Who is Apache Spark and Scala training for?

  • Anyone who wants to add Apache Spark and Scala skills to their profile
  • Teams getting started on Apache Spark and Scala projects
  • What are the prerequisites for Apache Spark and Scala training?

    • Basic familiarity with Linux or Unix
    • Intermediate-level of Hadoop

    Course Outline

    • Module-1: Introduction to Spark and Analysis
      • Why second generation frameworks?
      • Introduction to Spark
      • Scala shell
      • Spark Architecture
      • Spark on Cluster
      • Spark Core
      • SparkSQL
      • Spark Streaming
      • Cluster Managers
      • Spark Users
      • What is use of Spark
      • Spark Versions
      • Spark Storage Layers
      • Download Spark
    • 1. Spark API on a Cluster
      • A. Why second generation frameworks?
      • B .The Driver
      • C. Executors
      • D. Execution components: jobs, tasks, stages
      • E. Spark Web UI
    • 2. Cluster Manager
      • A. Standalone Cluster Manager
      • B .Hadoop YARN
      • C. Apache Mesos
      • D. Amazon EC2
      • E. Which Cluster Manager?
      • F. Spark-submit for deploying applications
      • G. Using MAVEN for JAVA SPARK application
      • H. Using SBT for A SCALA Application
    • Module-2: DATALOADING (HDFS, Amazon s3)
      • Different file formats:
      • Text files
      • Json
      • Comma ,tab separated values
      • Object files
      • Sequence files
      • Input /output formats
      • SPARKSQL for Structured data
    • Module-3: RDD'S
      • What is RDD
      • Why RDD?
      • RDD operations
      • Transformations
      • Actions
      • Lazy Evaluation
      • Basic RDD's
      • Caching
      • Converting between RDD types
      • Spark Api supports Python, Java, Scala
      • Working with Key, value pairs
      • Create key, value pair RDD's
    • 1. Transformations on pair RDD's
      • Aggregations
      • Grouping data
      • Joins
      • Sorting data
    • 2. Actions on pair RDD's
      • RDD's partitioner
      • Operations from partitioning
      • Page Rank example
    • 3. Advanced Spark operation
      • Aggregate
      • Fold
      • Map partitions
      • Glom
      • Accumulators
      • Broadcasters
      • Anatomy of a spark RDD
      • Splits
      • Localization
      • Serialization
      • Transformations Vs. Actions
    • Module-4: SPARK SQL
      • Spark sql in applications
      • Spark sql initialization
      • Spark sql basic query
      • Schema RDD's
      • Caching
      • Load data from hive
      • Load data from json
      • Load data from RDD's
      • Beeline
      • Long-lived tables and queries
      • Query hands-on
      • Spark sql UDF's
      • Performance
    • Module-5: SPARK STREAMING
      • Streaming Architecture
      • Two types of Transformations

      •    1. Stateless Transformations
      •    2. Stateful Transformations
      • Streaming UI
      • Sources: Input
      • Core Sources
      • Additional Sources
      • Multiple Sources
      • Cluster Sizing
    • 1. Fault Tolerance
      • Driver Fault Tolerance
      • Worker Fault Tolerance
      • Receiver Fault Tolerance
      • Operation 24/7
      • Performance
      • Garbage collection
      • Memory Usage

    Who is the instructor for this training?

    The trainer for this Apache Spark and Scala has extensive experience in this domain, including years of experience training & mentoring professionals.

    Reviews