HDP Developer: Apache PIG & HIVE

Live Online & Classroom Certification Training

Master to create applications and store big data in Apache hive using Pig and Hive. Be an expert in Hadoop, YARN, HDFS, MapReduce, data ingestion, workflow definition, using Pig and Hive to perform data analytics on Big Data.

(4.7) 9 Learners
Instructed by SPRINGPEOPLE
INDIA
  • 11
    Feb
    12 Days
    Online, 11-Feb to 24-Feb (Sunday - Saturday), LVC (08:30 PM Start) $1,012.38  Early Bird Offer: $934.44
  • 11
    Mar
    4 Days
    Bangalore, 11-Mar to 15-Mar (Sunday - Thursday), Classroom (10:30 PM Start) $1,012.38  Early Bird Offer: $934.44

Course Description

Overview

Master core concepts on hadoop distributed file system. Understand apache hive and advanced apache hive programming concepts as you learn with our certified experts. Learn how to use Hcatalog, joining datasets in apache hive and HDFS Commands.

Gain practical experience to import and export RDBMS data into HDFS, analyze clickstream data and analyze stock market Data using quantiles. With our cloudlabs get hands-on experience to run a YARN application, apache hive programming, analyzing big data with apache hive, join datasets with apache pig and starting an HDP cluster.

Objective

At the end of HDP Developer: Apache PIG and HIVE training, you will be able to:

  • Understand Hadoop and the Hadoop Distributed File System (HDFS)
  • List Common HDFS Commands
  • List the Six Key Hadoop Data Types
  • Export Table
  • Distinguish between Relational Databases and Hadoop
  • Undestand Purpose of NameNodes, DataNode, MapReduce and Reduce Phases
  • List Pig Latin Relation Names and Field Names
  • Learn proramming concepts using PIG and HIVE.
  • Perform Inner, Outer and Replicated Join
  • Demonstrate the Use of HCatLoader and HCatStorer with Apache Pig
  • Understand Lifecycle of YARN Applications
  • Common use cases of Spark
  • Load Data and Perform a Word Count
  • Perform SQL Queries
  • Perform DataFrame Operations
  • Submit an Apache Oozie Workflow 

Duration:

  • Classroom Training: 4 Days
  • Live Online Training:  24 Days

Suggested Audience: 

  • Software developers who need to understand and develop applications for Hadoop.

Prerequisites

  • Should be familiar with programming principles and have experience in software development.
  • SQL knowledge is also helpful.
  • No prior Hadoop knowledge is required.

Course Curriculum

Expand All
  • List the Three “V”s of Big Data
  • List the Six Key Hadoop Data Types
  • Describe Hadoop, YARN and Use Cases for Hadoop
  • Describe Hadoop Ecosystem Tools and Frameworks
  • Describe the Differences Between Relational Databases and Hadoop
  • Describe What is New in Hadoop 2.x
  • Describe the Hadoop Distributed File System (HDFS)
  • Describe the Differences Between HDFS and an RDBMS
  • Describe the Purpose of NameNodes and DataNodes
  • List Common HDFS Commands
  • Describe HDFS File Permissions
  • List Options for Data Input
  • Describe WebHDFS
  • Describe the Purpose of Sqoop and Flume
  • Describe How to Export to a Table
  • Describe the Purpose of MapReduce
  • Define Key/Value Pairs in MapReduce
  • Describe the Map and Reduce Phases
  • Describe Hadoop Streaming
  • Starting an HDP Cluster
  • Demonstration: Understanding Block Storage (Lab)
  • Using HDFS Commands (Lab)
  • Importing RDBMS Data into HDFS (Lab)
  • Exporting HDFS Data to an RDBMS (Lab)
  • Importing Log Data into HDFS Using Flume (Lab)
  • Demonstration: Understanding MapReduce (Lab)
  • Running a MapReduce Job (Lab)
  • Describe the Purpose of Apache Pig
  • Describe the Purpose of Pig Latin
  • Demonstrate the Use of the Grunt Shell
  • List Pig Latin Relation Names and Field Names
  • List Pig Data Types
  • Define a Schema
  • Describe the Purpose of the GROUP Operator
  • Describe Common Pig Operators ( ORDER BY, CASE, DISTINCT, PARALLEL, FLATTEN, FOREACH)
  • Perform an Inner, Outer and Replicated Join
  • Describe the Purpose of the DataFu Library
  • Demonstration: Understanding Apache Pig (Lab)
  • Getting Starting with Apache Pig (Lab)
  • Exploring Data with Apache Pig (Lab)
  • Splitting a Dataset (Lab)
  • Joining Datasets with Apache Pig (Lab)
  • Preparing Data for Apache Hive (Lab)
  • Demonstration: Computing Page Rank (Lab)
  • Analyzing Clickstream Data (Lab)
  • Analyzing Stock Market Data Using Quantiles (Lab)
  • Describe the Purpose of Apache Hive
  • Describe the Differences Between Apache Hive and SQL
  • Describe the Apache Hive Architecture
  • Demonstrate How to Submit Hive Queries
  • Describe How to Define Tables
  • Describe How to Load Date Into Hive
  • Define Hive Partitions, Buckets and Skew
  • Describe How to Sort Data
  • List Hive Join Strategies
  • Describe the Purpose of HCatalog
  • Describe the HCatalog Ecosystem
  • Define a New Schema
  • Demonstrate the Use of HCatLoader and HCatStorer with Apache Pig
  • Perform a Multi-table/File Insert
  • Describe the Purpose of Views
  • Describe the Purpose of the OVER Clause
  • Describe the Purpose of Windows
  • List Hive Analytics Functions
  • List Hive File Formats
  • Describe the Purpose of Hive SerDe
  • Understanding Hive Tables (Lab)
  • Understanding Partition and Skew (Lab)
  • Analyzing Big Data with Apache Hive (Lab)
  • Demonstration: Computing NGrams (Lab)
  • Joining Datasets in Apache Hive (Lab)
  • Computing NGrams of Emails in Avro Format (Lab)
  • Using HCatalog with Apache Pig (Lab)
  • Describe the Purpose HDFS Federation
  • Describe the Purpose of HDFS High Availability (HA)
  • Describe the Purpose of the Quorum Journal Manager
  • Demonstrate How to Configure Automatic Failover
  • Describe the Purpose of YARN
  • List the Components of YARN
  • Describe the Lifecycle of a YARN Application
  • Describe the Purpose of a Cluster View
  • Describe the Purpose of Apache Slider
  • Describe the Origin and Purpose of Apache Spark
  • List Common Spark Use Cases
  • Describe the Differences Between Apache Spark and MapReduce
  • Demonstrate the Use of the Spark Shell
  • Describe the Purpose of an Resilient Distributed Dateset (RDD)
  • Demonstrate How to Load Data and Perform a Word Count
  • Define Lazy Evaluation
  • Describe How to Load Multiple Types of Data
  • Demonstrate How to Perform SQL Queries
  • Demonstrate How to Perform DataFrame Operations
  • Describe the Purpose of the Optimization Engine
  • Describe the Purpose of Apache Oozie
  • Describe Apache Pig Actions
  • Describe Apache Hive Actions
  • Describe MapReduce Actions
  • Describe How to Submit an Apache Oozie Workflow
  • Define an Oozie Coordinator Job
  • Advanced Apache Hive Programming (Lab)
  • Running a YARN Application (Lab)
  • Getting Started with Apache Spark (Lab)
  • Exploring Apache Spark SQL (Lab)
  • Defining an Apache Oozie Workflow (Lab)

Certification

SpringPeople works with top industry experts to identify the leading certification bodies on different technologies - which are well respected in the industry and globally accepted as clear evidence of a professional’s “proven” expertise in the technology. As such, these certification are a high value-add to the CVs and can give a massive boost to professionals in their career/professional growth.

Our certification courses are fully aligned to these high-profile certification exams; at the end of the course, participants will have detailed knowledge, be eligible and be fully ready take up these certification exams and pass with flying colours.

 

Resources

SpringPeople Corporate Learning Center

About the Instructor

Founded in 2009, SpringPeople is a global premier eLearning marketplace for Online Live, Instructor-led classes in the region. It is a certified training delivery partner of leading technology creators, namely Pivotal, Elastic, Lightbend, EMC, VMware, MuleSoft, RSA, and... Read More


Course Rating and Reviews

4.7

Average Rating
5 Stars
28
4 Stars
12
3 Stars
1
2 Stars
0
1 Star
0

SPRINGPEOPLE SpringPeople Trainer

Ashok Reddy

Course:
Instructor:
Course Material:
Class Experience:
It goog if we get real time scenarios for automation

SPRINGPEOPLE SpringPeople Trainer

Goutham

Course:
Instructor:
Course Material:
Class Experience:
Maybe you could set Scala as a prerequisite for this course and discuss more technical details of Spark like what happens under the hood

SPRINGPEOPLE SpringPeople Trainer

Vamshi Suram

Software Engineer 2
Intuit
Course:
Instructor:
Course Material:
Class Experience:
Content is good. Could have added additional links to good resources to proceed further.

This class is intended for pariticipants without any previous knowledge of the technology and will cover fundamentals, building through to full hands-on expertise on the topic.

On successful completion of the course, participants will be eligible to sit of the related certification exam (see course overview). All participants receive a course completion certificate, demonstrating their expertise on the subject.

Total duration of the online, live instructor led sessions. Sessions are typically delivered as short lectures (2-hrs weekdays/3-hrs weekends) and detailed hands-on guidance.

Expected offline lab work hours that participants will need to complete and submit to the trainer, during and after the instructor-led online sessions.

  1. We are happy to refund full fee paid - no questions asked - should you feel that the training is not up to your expectations.
  2. Our dedicated team of expert training enablement advisors are available on email, phone and chat to assist you with your queries.
  3. All courseware, including session recordings, will always be available to access to you for future reference and rework.

Contact Us

+91-80-6567-9700 (BLR)

training@springpeople.com

Schedule a Call

Related Courses

Recently Viewed