Live Online & Classroom Certification Training

Master to create applications to analyze Big Data stored in Apache Hadoop using Spark. Gain hands on practical experience in Hortonworks Development platform (HDP), specially on the Apache Spark framework. Technical aspects of the framework like RDDs, Actions, Transformations, single and multi node cluster installations.

(4.7) 165 Learners
Instructed by SPRINGPEOPLE

No Public/Open-house class on the topic scheduled at the moment!

Course Description


Gain hands-on expertise of data exploration, Spark SQL and DataFrame operations; Spark Streaming and DStream operations; data visualization, building and deploying S park applications. Create applications to analyze Big Data stored in Apache Hadoop using Spark. Understand concepts of Hortonworks Data Platform (HDP), including HDFS and YARN and Spark Machine Learning Library. 

Through our cloud labs practise HDFS Commands, Create and Manipulate RDDs, Spark Streaming, Create and Save Tables and DataFrames, Performance Tunining, Machine Leaning Walkthrough and much more.



  • Describe Hadoop, HDFS, YARN, and the HDP ecosystem
  • Describe Spark use cases
  • Explore and manipulate data using Zeppelin
  • Explore and manipulate data using a Spark REPL
  • Explain the purpose and function of RDDs
  • Use Spark Streaming stateless and window transformations
  • Visualize data, generate reports, and collaborate using Zeppelin
  • Monitor Spark applications using Spark History Server
  • Learn general application optimization guidelines\/tips
  • Use data caching to increase performance of applications
  • Explain and use the various Hive file formats
  • Build and package Spark applications
  • Use Hive to run SQL-like queries to perform data analysis
  • Deploy applications to the cluster using YARN
  • Understand the purpose of Spark MLlib



  • Classroom - 4 Days
  • Live Online - 24 Hrs


Students should be familiar with programming principles and have previous experience in software development using either Python or Scala. Previous experience with data streaming, SQL, and HDP is also helpful, but not required.

Course Curriculum

Expand All
  • Describe the Characteristics and Types of Big Data
  • Define HDP and How it Fits into the Overall Data Lifecycle Management Strategies
  • Describe and Use HDFS
  • Explain the Purpose and Function of YARN
  • Use Apache Zeppelin to work with Spark
  • Describe the Purpose and Benefits of Spark
  • Define Spark REPLs and Applications Architecture
  • Explain the Purpose and Function of RDDs
  • Explain Spark Programming Basics
  • Define and Use Spark Basic Actions
  • Invoke Functions for Multiple RDDs, Create Named Functions and Use Numeric Operations
  • Define and Create Pair RDDs
  • Perform Common Operations on Pair RDDs
  • Using HDFS Commands (Labs)
  • Introduction to Spark REPLs and Zeppelin (Labs)
  • Create and Manipulate RDDs (Labs)
  • Create and Manipulate Pair RDDs (Labs)
  • Describe Spark Streaming
  • Create and View Basic Data Streams
  • Perform Basic Transformations on Streaming Data
  • Utilize Window Transformations on Streaming Data
  • Name the Various Components of Spark SQL and Explain their Purpose
  • Describe the Relationship Between DataFrames and Tables
  • Use Various Methods to Create and Save DataFrames and Tables
  • Manipulate DataFrames and Tables
  • Describe the Difference Between SQLContext vs HiveContext
  • Demonstrate How to Convert an RDD to a DataFrame
  • Demonstrate How to Convert DataFrames Programmatically
  • Describe the Function and Use of sqlContext.sql() and show()
  • Demonstrate How to Register DataFrames as Temporary Tables
  • Explain How to Save DataFrames as Files Using write()
  • Explain How to Create DataFrames from Files Using read()
  • Basic Spark Streaming (Labs)
  • Basic Spark Streaming Transformations (Labs)
  • Spark Streaming Window Transformations (Labs)
  • Create and Save DataFrames (Labs)
  • Working with Tables and DataFrames (Labs)
  • Explain the Purpose of Data Visualization
  • Explain the Benefits of Data Visualization
  • Perform Interactive Data Exploration Using Visualization in Zeppelin
  • Collaborate with Other Developers and Stakeholders Using Zeppelin
  • Describe the Components of a Spark Jobs
  • Explain Default Parallel Execution for Stages, and Tasks Across CPU Cores
  • Monitor Spark Jobs via the Spark Application UI
  • Describe the Components of the Spark Application UI Landing Page
  • Describe the Purpose of the Job View
  • Describe the Purpose of the Job Events Timeline
  • Describe the Purpose of Job DAG Visualization
  • Describe the Purpose of the Stage View
  • Describe the Purpose of the Stage Event Timeline
  • Describe the Purpose of the Stage Task List
  • Describe the Purpose of the Executor View
  • Describe the Purpose of the Streaming View
  • Data visualization Reporting and Collaboration Using Zeppelin (Labs)
  • Job Monitoring (Labs)
  • Explain Why mapPartitions() Usually Perform Better than Map()
  • Describe How to Repartition RDDs and How this can Improve Performance
  • Explain the Different Caching Options Available
  • Describe How Checkpointing can Reduce Recovery Time in the Event of Losing an Executor
  • Describe Situations Where Broadcasting Increases Runtime Efficiencies
  • Detail the Options Available for Configuring Executors
  • Explain the Purpose and Functions of YARN
  • Create an Application to Submit to the Cluster
  • Describe Client vs Cluster Submission with YARN
  • Submit an Application to the Cluster
  • List and Set Important Configuration Items
  • Describe the Purpose of Machine Learning and Some Commonly Used Algorithms
  • Describe the Machine Learning Packages Available in Spark
  • Examine and Run Sample machine Learning Applications
  • Performance Tuning (Labs)
  • Build and Submit Applications to YARN (Labs)
  • Machine Learning Walkthrough (Labs)


SpringPeople works with top industry experts to identify the leading certification bodies on different technologies - which are well respected in the industry and globally accepted as clear evidence of a professional’s “proven” expertise in the technology. As such, these certification are a high value-add to the CVs and can give a massive boost to professionals in their career/professional growth.

Our certification courses are fully aligned to these high-profile certification exams; at the end of the course, participants will have detailed knowledge, be eligible and be fully ready take up these certification exams and pass with flying colours.



About the Instructor

Founded in 2009, SpringPeople is a global premier eLearning marketplace for Online Live, Instructor-led classes in the region. It is a certified training delivery partner of leading technology creators, namely Pivotal, Elastic, Lightbend, EMC, VMware, MuleSoft, RSA, and... Read More

Course Rating and Reviews


Average Rating
5 Stars
4 Stars
3 Stars
2 Stars
1 Star

SPRINGPEOPLE SpringPeople Trainer

Richa Sinha

Course Material:
Class Experience:
There should be an inclusion of the best practices.

SPRINGPEOPLE SpringPeople Trainer

Lohith MV

Tech Lead
Pramata Knowledge Solutions
Course Material:
Class Experience:
I felt course went little slow, we should have covered more topics

SPRINGPEOPLE SpringPeople Trainer

Madhav NV

Product Manager
Sonata Software
Course Material:
Class Experience:

This class is intended for pariticipants without any previous knowledge of the technology and will cover fundamentals, building through to full hands-on expertise on the topic.

On successful completion of the course, participants will be eligible to sit of the related certification exam (see course overview). All participants receive a course completion certificate, demonstrating their expertise on the subject.

Total duration of the online, live instructor led sessions. Sessions are typically delivered as short lectures (2-hrs weekdays/3-hrs weekends) and detailed hands-on guidance.

Expected offline lab work hours that participants will need to complete and submit to the trainer, during and after the instructor-led online sessions.

  1. We are happy to refund full fee paid - no questions asked - should you feel that the training is not up to your expectations.
  2. Our dedicated team of expert training enablement advisors are available on email, phone and chat to assist you with your queries.
  3. All courseware, including session recordings, will always be available to access to you for future reference and rework.

Contact Us

1800-313-4030 (BLR)


Schedule a Call

Related Courses

Recently Viewed