Big Data Analysis using Hadoop Certification Training

Live Online & Classroom Certification Training

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem

(4.7) 191 Learners
Instructed by SPRINGPEOPLE

No Public/Open-house class on the topic scheduled at the moment!

Course Description


Through instructor-led discussion and interactive hands-on exercises participants will navigate the Hadoop ecosystem


At the end of Big Data Analytics on Hadoop training course, the participants will learn:

  • The features that Pig, Hive, Flume, Sqoop and Oozie offer for data acquisition, storage, and analysis
  • The fundamentals of Apache Hadoop and data ETL (extract, transform, load), ingestion, and processing with Hadoop
  • Joining diverse datasets to gain valuable business insight
  • Performing real-time, complex queries on datasets
  • Introduction to Apache Spark, Spark SQL and DataFrames for performing data analysis.

Duration - 3 Days



Course Curriculum

Expand All
  • What is Big Data and role of Hadoop in Big Data?
  • The Motivation for Hadoop and use cases.
  • Hadoop 2.0 Overview
  • Distributed Data Processing: YARN
  • Hadoop vs RDBMS
  • Hadoop Ecosystem
  • - Data Processing and Analysis: Pig, Hive, Spark
  • - Data Integration: Sqoop
  • - Streaming Analysis: Flume
  • - Workflow: Oozie
  • HDFS components (Blocks, Name Node, Data Node)
  • HDFS High Availability
  • Important HDFS commands
  • Anatomy of File Read and Write
  • Lab: HDFS commands
  • Using Hadoop client
  • Web HDFS
  • Using Sqoop (data transfer between RDBMS and Hadoop)
  • Flume (extract streaming data)
  • Lab : using Sqoop to export and import data
  • Lab: use Flume to extract streaming data.
  • Understanding Map and Reduce concepts
  • WordCount MapReduce Program
  • Lab: run a MapReduce program from YARN
  • What Is Apache Pig? its features and use cases
  • Interacting with Pig – Pig Latin and Grunt shell
  • Running Pig – Local Mode, MapReduce Mode
  • Lab: Invoking Pig
  • Pig Latin syntax and data types
  • Defining and viewing the schemas
  • Loading and storing data.
  • Grouping, filtering and sorting data
  • Lab: basic data analysis using pig
  • Frequently used built in functions.
  • Lab: splitting data sets.
  • Joining data sets – performing inner joins, outer joins, right outer joins, left outer joins, replicated joins, COGROUP
  • Lab: joining data sets
  • Pig User Defined function. An UDF example. How to invoke a UDF?
  • Pig scripts and parameter substitution
  • Lab: analyse unstructured data using Pig
  • advance data analysis using Pig
  • Tips for optimizing performance of Pig jobs
  • What is Hive? Hive Query Language (HQL) versus SQL
  • Hive Architecture
  • Hive QL syntax and data types
  • Invoking Hive, Hive Shell, submitting Hive queries
  • Creating Hive databases and Tables.
  • - Managed Tables and External tables
  • Different ways of loading data into Hive table
  • Simplifying queries using Views
  • Storing query results to a file
  • Lab: create and load data into Hive tables. Query Hive tables.
  • Hive partitions, buckets and skewed tables
  • Hive File Formats – SerDe, ORC, sequential
  • Sorting Data – ORDER BY and SORT BY
  • Lab: analyse big data using Hive
  • Hive Joins – Inner joins and Outer Join
  • Commonly used Hive Built in Functions
  • Hive user defined function.
  • Using Aggregation and Windowing. PARTITION BY clause
  • Analytical functions – RANK, DENSE RANK
  • Lab: advance Hive programming
  • Hive CBO, computing column and table statistics
  • Tips for Hive Performance Optimization
  • About Hcatalog
  • Hcatalog in the Hadoop ecosystem
  • Using HCatloader to load data into Pig relation from Hive table
  • Using HCatstorer to store data from Pig into Hive table.
  • Lab : using HCatalog with Pig.
  • Oozie components – Actions, Fork, Join Nodes, Workflow, Coordinator
  • Submit a Oozie Workflow
  • Lab: create a oozie workflow to run Pig and Hive job.
  • What is Apache Spark? Spark Origin
  • Spark Ecosystem
  • Spark use cases
  • Spark versus MapReduce
  • What is Spark context?
  • Understanding RDD.
  • Create an RDD
  • Spark Operations - Transformations and Actions
  • Examples of Actions and Transformations.
  • Spark WordCount program using Python
  • Lab : getting started with Spark
  • Spark SQL overview
  • DataFrames overview
  • SQLContext and HiveContext
  • Performing Spark SQL queries
  • Performing DataFrame operations
  • Lab: Spark SQL and DataFrame exercise to perform data analysis.


SpringPeople works with top industry experts to identify the leading certification bodies on different technologies - which are well respected in the industry and globally accepted as clear evidence of a professional’s “proven” expertise in the technology. As such, these certification are a high value-add to the CVs and can give a massive boost to professionals in their career/professional growth.

Our certification courses are fully aligned to these high-profile certification exams; at the end of the course, participants will have detailed knowledge, be eligible and be fully ready take up these certification exams and pass with flying colours.



Technology Introduction Slides

Technology Introduction Slides

SpringPeople Corporate Learning Center

Job Trends

About the Instructor

Founded in 2009, SpringPeople is a global premier eLearning marketplace for Online Live, Instructor-led classes in the region. It is a certified training delivery partner of leading technology creators, namely Pivotal, Elastic, Lightbend, EMC, VMware, MuleSoft, RSA, and... Read More

Course Rating and Reviews


Average Rating
5 Stars
4 Stars
3 Stars
2 Stars
1 Star

SPRINGPEOPLE SpringPeople Trainer

Nishit Sinha

Senior Applications Developer
ServiceNow Inc.
Course Material:
Class Experience:
Nothing as such

SPRINGPEOPLE SpringPeople Trainer


Course Material:
Class Experience:
session like reading the slides

SPRINGPEOPLE SpringPeople Trainer


Course Material:
Class Experience:
The trainer was pretty new to the training and the class was not helpful. Not at all recommended. Failed to satisfy my requirements

This class is intended for participants with some prior exposure to the technology and are now looking to build up their expertise on the topic.

On successful completion of the course, participants will be eligible to sit of the related certification exam (see course overview). All participants receive a course completion certificate, demonstrating their expertise on the subject.

Total duration of the online, live instructor led sessions. Sessions are typically delivered as short lectures (2-hrs weekdays/3-hrs weekends) and detailed hands-on guidance.

Expected offline lab work hours that participants will need to complete and submit to the trainer, during and after the instructor-led online sessions.

  1. We are happy to refund full fee paid - no questions asked - should you feel that the training is not up to your expectations.
  2. Our dedicated team of expert training enablement advisors are available on email, phone and chat to assist you with your queries.
  3. All courseware, including session recordings, will always be available to access to you for future reference and rework.

Contact Us

+91-80-6567-9700 (BLR)

Schedule a Call

Related Courses

Recently Viewed