Hortonworks Certified HDP Developer: Apache Pig and Hive Training

Live Online & Classroom Certification Training

If you are looking for a course which introduces you to the basics of Hadoop like HDFS and Map Reduce, and then moves on to higher abstractions like Pig and hive, then Hortonworks Certified course HDP Developer: Apache Pig and Hive is something you should look at. The course teaches you the fundamentals of map reduce and HDFS quite comprehensively before moving on to transformations in pig and internal and external table creation and manipulation in hive amongst many other such advanced topics.

(4.7) 9 Learners
Instructed by SPRINGPEOPLE
INDIA
  • 06
    Mar
    12 Days
    Online, 06-Mar to 18-Mar (Monday - Saturday), LVC (10:00 AM Start) $975.09

Course Description

Overview

This course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Pig and Hive. Topics include: Hadoop, YARN, HDFS, MapReduce, data ingestion, workflow definition and using Pig and Hive to perform data analytics on Big Data. Labs are executed on a 7-node HDP cluster.

Objective

  • Describe Hadoop, YARN and use cases for Hadoop
  • Describe Hadoop ecosystem tools and frameworks
  • Describe the HDFS architecture
  • Use the Hadoop client to input data into HDFS
  • Transfer data between Hadoop and a relational database
  • Explain YARN and MapReduce architectures
  • Run a MapReduce job on YARN
  • Use Pig to explore and transform data in HDFS
  • Use Hive to explore Understand how Hive tables are defined and implementedand analyze data sets
  • Use the new Hive windowing functions
  • Explain and use the various Hive file formats
  • Create and populate a Hive table that uses ORC file formats
  • Use Hive to run SQL-like queries to perform data analysis
  • Use Hive to join datasets using a variety of techniques,including Map-side joins and Sort-Merge-Bucket joins
  • Write efficient Hive queries
  • Create ngrams and context ngrams using Hive
  • Perform data analytics like quantiles and page rank on BigData using the DataFu Pig library
  • Explain the uses and purpose of HCatalog
  • Use HCatalog with Pig and Hive
  • Define a workflow using Oozie
  • Schedule a recurring workflow using the Oozie Coordinator

Prerequisites

Students should be familiar with programming principles and have experience in software development. SQL knowledge is also helpful. No prior Hadoop knowledge is required.

Course Curriculum

Expand All
  • Describe Hadoop, YARN and use cases for Hadoop
  • Describe Hadoop ecosystem tools and frameworks
  • Describe the HDFS architecture
  • Use the Hadoop client to input data into HDFS
  • Transfer data between Hadoop and a relational database
  • Explain YARN and MapReduce architectures
  • Run a MapReduce job on YARN
  • Use Pig to explore and transform data in HDFS
  • Use Hive to explore Understand how Hive tables are defined and implemented and analyze data sets
  • Use the new Hive windowing functions
  • Explain and use the various Hive file formats
  • Create and populate a Hive table that uses ORC file formats
  • Use Hive to run SQL-like queries to perform data analysis
  • Use Hive to join datasets using a variety of techniques, including Map-side joins and Sort-Merge-Bucket joins
  • Write efficient Hive queries
  • Create ngrams and context ngrams using Hive
  • Perform data analytics like quantiles and page rank on Big Data using the DataFu Pig library
  • Explain the uses and purpose of HCatalog
  • Use HCatalog with Pig and Hive
  • Define a workflow using Oozie
  • Schedule a recurring workflow using the Oozie Coordinato
  • Lab: Starting and HDP 2.3 Cluster
  • Demo: Block Stprage
  • Lab: Using HDFS commands
  • Lab: Importing and Exporting Data in HDFS
  • Lab: Using Flume to import log files into HDFS
  • Demo: MapReduce
  • Lab: Running a MapReduce Job
  • Demo: Apache Pig
  • Lab: Getting started with Apache Pig
  • Lab: Exploring data with Apache Pig
  • Lab: Splitting a dataset Use Sqoop to transfer data between HDFS and a RDBMS
  • Run MapReduce and YARN application jobs
  • Explore and transform data using Pig
  • Split and join a dataset using Pig
  • Use Pig to transform and export a dataset for use with Hive
  • Use HCatLoader and HCatStorer
  • Use Hive to discover useful information in a dataset
  • Describe how Hive queries get executed as MapReduce jobs
  • Perform a join of two datasets with Hive
  • Use advanced Hive features : windowing, views , ORC files
  • Use Hive analytics functions
  • Write a custom reducer in Python
  • Analyze and sessionize clickstream data
  • Compute quantiles of NYSE stock prices
  • Use Hive to compute ngrams on Avro-formatted files
  • Lab: Exploring Spark SQL
  • Lab: Defining an Oozie workflow

Certification

SpringPeople works with top industry experts to identify the leading certification bodies on different technologies - which are well respected in the industry and globally accepted as clear evidence of a professional’s “proven” expertise in the technology. As such, these certification are a high value-add to the CVs and can give a massive boost to professionals in their career/professional growth.

Our certification courses are fully aligned to these high-profile certification exams; at the end of the course, participants will have detailed knowledge, be eligible and be fully ready take up these certification exams and pass with flying colours.

 

Resources

SpringPeople Corporate Learning Center

Job Trends

About the Instructor

Founded in 2009, SpringPeople is a global premier eLearning marketplace for Online Live, Instructor-led classes in the region. It is a certified training delivery partner of leading technology creators, namely Pivotal, Elastic, Lightbend, EMC, VMware, MuleSoft, RSA, and... Read More


Course Rating and Reviews

4.7

Average Rating
5 Stars
28
4 Stars
12
3 Stars
1
2 Stars
0
1 Star
0

SPRINGPEOPLE SpringPeople Trainer

Bharath Vivekanandan

Course:
Instructor:
Course Material:
Class Experience:
Adding one more day to the course duration might help

SPRINGPEOPLE SpringPeople Trainer

Ravi Ranjan Kumar

Engineer
Mindtree Ltd.
Course:
Instructor:
Course Material:
Class Experience:
not required

SPRINGPEOPLE SpringPeople Trainer

Archana M

Team Lead
Mindtree
Course:
Instructor:
Course Material:
Class Experience:
Overall Good Training experience, interactive sessions, got to know many insights of Powershell, Trainer knowledge and explanation was excellent.

This class is intended for participants with some prior exposure to the technology and are now looking to build up their expertise on the topic.

On successful completion of the course, participants will be eligible to sit of the related certification exam (see course overview). All participants receive a course completion certificate, demonstrating their expertise on the subject.

Total duration of the online, live instructor led sessions. Sessions are typically delivered as short lectures (2-hrs weekdays/3-hrs weekends) and detailed hands-on guidance.

Expected offline lab work hours that participants will need to complete and submit to the trainer, during and after the instructor-led online sessions.

  1. We are happy to refund full fee paid - no questions asked - should you feel that the training is not up to your expectations.
  2. Our dedicated team of expert training enablement advisors are available on email, phone and chat to assist you with your queries.
  3. All courseware, including session recordings, will always be available to access to you for future reference and rework.

Contact Us

+91-80-6567-9700 (BLR)

training@springpeople.com

Request Call Back

Related Courses

Recently Viewed