HDP Developer: Java

Live Online & Classroom Certification Training

Master core concepts of Hadoop application development. Design and develop MapReduce applications for Hadoop using Hortonworks Data Platform. Be an expert to implement combiners, partitioners, secondary sorts, custom input and output formats, joining large datasets, unit testing, and developing UDFs for Pig and Hive.

(4.7) 57 Learners
Instructed by SPRINGPEOPLE

    No Public/Open-house class on the topic scheduled at the moment!

Course Description


Gain end to end knowledge to optimize Mapreduce jobs and learn advanced Mapreduce features. Understand HDFS and Map aggregation as you learn with our certified instructors.

Learn how to write custom partitioner, custom input format, perform Map-side join, import data to Hbase and working with Pig and Hive programming.

Gain practical knowledge with our cloudlabs on configuration Hadoop development environment, Combining Input Files, Using Data Compression, Writing a Pig UDF, Writing a Pig Accumulator and defining an Oozie Workflow. 


At the end of HDP Developer: Java training, you will be able to:

  • Understand Hadoop, the Hadoop Distributed File System (HDFS) and Map Reduce
  • Practise comman HDFS Commands
  • Work on Open-Source YARN Use Cases
  • Learn in depth of Map Aggregation
  • Write Custom Partitioner
  • Create and Distribute a Partition File
  • Write a Group Comparator
  • Built-In Input Formats
  • Handle Records that Span Splits
  • Built-In Output Formats
  • Write a Custom Output Format
  • Optimize the Map and Reduce Phases
  • Configure of Data Compression 
  • Perform Joins in MapReduce
  • Set Up a Test
  • Test a Mapper
  • Test a Reducer
  • Learn the use of the Grunt Shell
  • Perform Queries
  • Wrtie a Hive UDF
  • Classroom Training: 4 Days
  • Live Online Training:  24 Hours

 Suggested Audience:

  •  Experienced Java software engineers who need to develop Java MapReduce applications for Hadoop



  • Experience in developing Java applications and using a Java IDE
  • No prior Hadoop knowledge is required.

Course Curriculum

Expand All
  • Describe Hadoop 2.X and the Hadoop Distribute File System
  • Describe the YARN framework
  • Describe the Purpose of NameNodes and Data Nodes
  • Describe the Purpose of HDFS High Availability (HA)
  • Describe the Purpose of the Quorum Journal Manager
  • List Common HDFS Commands
  • Describe the Purpose of YARN
  • List Open-Source YARN Use Cases
  • List the Components of YARN
  • Describe the Life Cycle of a YARN Application
  • Define Map Aggregation
  • Describe the Purpose of Combiners
  • Describe the Purpose of In-Map Aggregation
  • Describe the Purpose of Counters
  • Describe the Purpose of User-Defined Counters
  • Understanding Block Storage
  • Configuring a Hadoop Development Environment
  • Putting Files in HDFS with Java
  • Understanding Map Reduce (Lab)
  • Word Count (Lab)
  • Distributed Grep (Lab)
  • Inverted Index (Lab)
  • Using a Combiner (Lab)
  • Computing an Average (Lab)
  • Describe the Purpose of a Partitioner
  • List the Steps for Writing a Custom Partitioner
  • Describe How to Create and Distribute a Partition File
  • Describe the Purpose of Sorting
  • Describe the Purpose of Custom Keys
  • Describe How to Write a Group Comparator
  • List the Built-In Input Formats
  • Describe the Purpose of Input Formats
  • Define a Record Reader
  • Describe How to Handle Records that Span Splits
  • List the Built-In Output Formats
  • Describe How to Write a Custom Output Format
  • Describe the Purpose of the MultipleOutputs Class
  • Writing a Custom Partitioner (Lab)
  • Using TotalOrderPartitioner (Lab)
  • Custom Sorting (Lab)
  • Demonstration: Combining Input Files (Lab)
  • Processing Multiple Inputs (Lab)
  • Writing a Custom Input Format (Lab)
  • Customizing Output (Lab)
  • Working with a Simple Moving Average (Lab)
  • List Optimization Best Practices
  • Describe How to Optimize the Map and Reduce Phases
  • Describe the Benefits of Data Compression
  • Describe the Limits of Data Compression
  • Describe the Configuration of Data Compression
  • Describe the Purpose of a RawComparator
  • Describe the Purpose of Localization
  • List Scenarios for Performing Joins in MapReduce
  • Describe the Purpose of the Bloom Filter
  • Describe the Purpose of MRUnit and the MRUnit API
  • Describe How to Set Up a Test
  • Describe How to Test a Mapper
  • Describe How to Test a Reducer
  • Describe the Purpose of HBase
  • Define the Differences Between a Relational Database and HBase
  • Describe the HBase Architecture
  • Demonstrate the Basics of HBase Programming
  • Describe an HBase MapReduce Applications
  • Using Data Compression (Lab)
  • Defining a RawComparator (Lab)
  • Performing a Map-Side Join (Lab)
  • Using a Bloom Filter (Lab)
  • Unit Testing a MapReduce Job (Lab)
  • Importing Data to HBase (Lab)
  • Creating an HBase Mapreduce Job (Lab)
  • Describe the Purpose of Apache Pig and Pig Latin
  • Demonstrate the Use of the Grunt Shell
  • List the Common Pig Data Types
  • Describe the Purpose of the FOREACH GENERATE Operator
  • Describe the Purpose of Pig User Defined Functions (UDFs)
  • Describe the Purpose of Filter Functions
  • Describe the Purpose of Accumulator UDFs
  • Describe the Purpose of Algebraic Functions
  • Describe the Purpose of Apache Hive
  • Describe the Differences Between Apache Hive and SQL
  • Describe Apache Hive Architecture
  • Describe How to Load Data Into Hive
  • Demonstrate How to Perform Queries
  • Describe the Purpose of Hive User Defined Functions (UDFs)
  • Write a Hive UDF
  • Describe the Purpose of HCatalog
  • Describe the Purpose of Apache Oozie
  • Describe How to Define an Oozie Workflow
  • Describe Pig and Hive Actions
  • Describe How to Define an Oozie Coordinator Job
  • Understanding Pig (Lab)
  • Writing a Pig UDF (Lab)
  • Writing a Pig Accumulator (Lab)
  • Writing a Apache Hive UDF (Lab)
  • Defining an Oozie Workflow (Lab)
  • Working with TF-IDF and the JobControl Class (Lab)


SpringPeople works with top industry experts to identify the leading certification bodies on different technologies - which are well respected in the industry and globally accepted as clear evidence of a professional’s “proven” expertise in the technology. As such, these certification are a high value-add to the CVs and can give a massive boost to professionals in their career/professional growth.

Our certification courses are fully aligned to these high-profile certification exams; at the end of the course, participants will have detailed knowledge, be eligible and be fully ready take up these certification exams and pass with flying colours.



SpringPeople Corporate Learning Center

About the Instructor

Founded in 2009, SpringPeople is a global premier eLearning marketplace for Online Live, Instructor-led classes in the region. It is a certified training delivery partner of leading technology creators, namely Pivotal, Elastic, Lightbend, EMC, VMware, MuleSoft, RSA, and... Read More

Course Rating and Reviews


Average Rating
5 Stars
4 Stars
3 Stars
2 Stars
1 Star

SPRINGPEOPLE SpringPeople Trainer

Richa Sinha

Course Material:
Class Experience:
There should be an inclusion of the best practices.

SPRINGPEOPLE SpringPeople Trainer

Lohith MV

Tech Lead
Pramata Knowledge Solutions
Course Material:
Class Experience:
I felt course went little slow, we should have covered more topics

SPRINGPEOPLE SpringPeople Trainer

Madhav NV

Product Manager
Sonata Software
Course Material:
Class Experience:

This class is intended for participants with some prior exposure to the technology and are now looking to build up their expertise on the topic.

On successful completion of the course, participants will be eligible to sit of the related certification exam (see course overview). All participants receive a course completion certificate, demonstrating their expertise on the subject.

Total duration of the online, live instructor led sessions. Sessions are typically delivered as short lectures (2-hrs weekdays/3-hrs weekends) and detailed hands-on guidance.

Expected offline lab work hours that participants will need to complete and submit to the trainer, during and after the instructor-led online sessions.

  1. We are happy to refund full fee paid - no questions asked - should you feel that the training is not up to your expectations.
  2. Our dedicated team of expert training enablement advisors are available on email, phone and chat to assist you with your queries.
  3. All courseware, including session recordings, will always be available to access to you for future reference and rework.

Contact Us

1800-313-4030 (BLR)


Schedule a Call

Related Courses

Recently Viewed