Lightbend Apache Spark for Scala - Professional

Live Online & Classroom Certification Training

This two-day course, created by Dean Wampler, Ph.D., is designed to teach developers how to implement data processing pipelines and analytics using Apache Spark .

(4.7) 117 Learners
Instructed by SPRINGPEOPLE

No Public/Open-house class on the topic scheduled at the moment!

Course Description


This two-day course, created by Dean Wampler, Ph.D., is designed to teach developers how to implement data processing pipelines and analytics using Apache Spark . Developers will use hands-on exercises to learn the Spark Core, SQL/DataFrame, Streaming, and MLlib (machine learning) APIs. Developers will also learn about Spark internals and tips for improving application performance. Additional coverage includes integration with Mesos, Hadoop, and Reactive frameworks like Akka.


After having participated in this course you should:

  • Understand how to use the Spark Scala APIs to implement various data analytics algorithms for offline (batch-mode) and event-streaming applications
  • Understand Spark internals
  • Understand Spark performance considerations
  • Understand how to test and deploy Spark applications
  • Understand the basics of integrating Spark with Mesos, Hadoop, and Akka


  • Experience with Scala, such as completion of Fast Track to Scala course
  • Experience with SQL, machine learning, and other Big Data tools will be helpful, but not required.

Course Curriculum

Expand All
  • How Spark improves on Hadoop MapReduce
  • The core abstractions in Spark
  • What happens during a Spark job?
  • The Spark ecosystem
  • Deployment options
  • References for more information
  • Resilient Distributed Datasets (RDD) and how they implement your job
  • Using the Spark Shell (interpreter) vs submitting Spark batch jobs
  • Using the Spark web console.
  • Reading and writing data files
  • Working with structured and unstructured data
  • Building data transformation pipelines
  • Spark under the hood: caching, checkpointing, partitioning, shuffling, etc.
  • Mastering the RDD API
  • Broadcast variables, accumulators
  • Working with the DataFrame API for structured data
  • Working with SQL
  • Performance optimizations
  • Support for JSON and Parquet formats
  • Integration with Hadoop Hive
  • Working with time slices, “mini-batches”, of events
  • Working with moving windows of mini-batches
  • Reuse of code in batch-mode and streaming: the Lambda Architecture
  • Working with different streaming sources: sockets, file systems, Kafka, etc.
  • Resiliency and fault tolerance considerations
  • Stateful transformations (e.g., running statistics)
  • MLlib for machine learning
  • Discussion of GraphX for graph algorithms, Tachyon for distributed caching, and BlinkDB for approximate queries
  • Spark’s clustering abstractions: cluster vs. client deployments, coarse-grained and fine-grained process management
  • Standalone mode
  • Mesos
  • Hadoop YARN
  • EC2
  • Cassandra rings


SpringPeople works with top industry experts to identify the leading certification bodies on different technologies - which are well respected in the industry and globally accepted as clear evidence of a professional’s “proven” expertise in the technology. As such, these certification are a high value-add to the CVs and can give a massive boost to professionals in their career/professional growth.

Our certification courses are fully aligned to these high-profile certification exams; at the end of the course, participants will have detailed knowledge, be eligible and be fully ready take up these certification exams and pass with flying colours.



About the Instructor

Founded in 2009, SpringPeople is a global premier eLearning marketplace for Online Live, Instructor-led classes in the region. It is a certified training delivery partner of leading technology creators, namely Pivotal, Elastic, Lightbend, EMC, VMware, MuleSoft, RSA, and... Read More

Course Rating and Reviews


Average Rating
5 Stars
4 Stars
3 Stars
2 Stars
1 Star

SPRINGPEOPLE SpringPeople Trainer

Richa Sinha

Course Material:
Class Experience:
There should be an inclusion of the best practices.

SPRINGPEOPLE SpringPeople Trainer

Lohith MV

Tech Lead
Pramata Knowledge Solutions
Course Material:
Class Experience:
I felt course went little slow, we should have covered more topics

SPRINGPEOPLE SpringPeople Trainer

Madhav NV

Product Manager
Sonata Software
Course Material:
Class Experience:

This class is intended for pariticipants without any previous knowledge of the technology and will cover fundamentals, building through to full hands-on expertise on the topic.

On successful completion of the course, participants will be eligible to sit of the related certification exam (see course overview). All participants receive a course completion certificate, demonstrating their expertise on the subject.

Total duration of the online, live instructor led sessions. Sessions are typically delivered as short lectures (2-hrs weekdays/3-hrs weekends) and detailed hands-on guidance.

Expected offline lab work hours that participants will need to complete and submit to the trainer, during and after the instructor-led online sessions.

  1. We are happy to refund full fee paid - no questions asked - should you feel that the training is not up to your expectations.
  2. Our dedicated team of expert training enablement advisors are available on email, phone and chat to assist you with your queries.
  3. All courseware, including session recordings, will always be available to access to you for future reference and rework.

Contact Us

1800-313-4030 (BLR)

Schedule a Call

Related Courses

Recently Viewed