Apache PIG & HIVE Training

Live Online & Classroom Enterprise Training

Master to create applications and store big data in Apache hive using Pig and Hive. Be an expert in Hadoop, YARN, HDFS, MapReduce, data ingestion, workflow definition, using Pig and Hive to perform data analytics on Big Data.

Looking for a private batch ?

REQUEST A CALLBACK

Enterprise Reporting
Lifetime Access
CloudLabs
24x7 Support
Real-time code analysis and feedback

What is Apache Pig & Hive Training about?

Master core concepts on hadoop distributed file system and Understand apache pig and advanced apache hive programming concepts as you learn with our certified experts. Learn how to use Hcatalog, joining datasets in apache hive and HDFS Commands.

Gain practical experience to import and export RDBMS data into HDFS, analyze clickstream data. Data using quantiles. With our cloudlabs get hands-on experience to run a YARN application, apache hive programming, analyzing big data with apache hive, join datasets with apache pig and starting an HDP cluster.

What are the objectives of Apache Pig & Hive Training ?

At the end of Apache PIG and HIVE training, you will be able to:

Explain Hadoop and the Hadoop Distributed File System (HDFS)
Interpret Common HDFS Commands Types
Export Table
Distinguish between Relational Databases and Hadoop
Explain Purpose of NameNodes, DataNode, MapReduce and Reduce Phases
Differentiate Pig Latin Relation Names and Field Names
Explain programming concepts using PIG and HIVE.
Perform Inner, Outer and Replicated Join
Demonstrate the Use of HCatLoader and HCatStorer with Apache Pig
Explain Lifecycle of YARN Applications
Common use cases of Spark
Load Data and Perform a Word Count
Perform SQL Queries
Perform DataFrame Operations
Submit an Apache Oozie Workflow

Who is Apache Pig & Hive Training for?

Developers working on huge Data Sets
Data Analytics Professionals
Managers working on Big Data Projects

What are the prerequisites for Apache Pig & Hive Training?

Should be familiar with programming principles and have experience in software development.
SQL knowledge is also helpful.
No prior Hadoop knowledge is required.

Available Training Modes

Live Online Training

12 Hours

Classroom Training

2 Days

Course Outline Expand All

Expand All

Understanding Hadoop and HDFS

List the Three "V"s of Big Data

List the Six Key Hadoop Data Types

Describe Hadoop, YARN and Use Cases for Hadoop

Describe Hadoop Ecosystem Tools and Frameworks

Describe the Differences Between Relational Databases and Hadoop

Describe What is New in Hadoop 2.x

Describe the Hadoop Distributed File System (HDFS)

Describe the Differences Between HDFS and an RDBMS

Describe the Purpose of NameNodes and DataNodes

List Common HDFS Commands

Describe HDFS File Permissions

List Options for Data Input

Describe WebHDFS

Describe the Purpose of Sqoop and Flume

Describe How to Export to a Table

Describe the Purpose of MapReduce

Define Key/Value Pairs in MapReduce

Describe the Map and Reduce Phases

Describe Hadoop Streaming

Starting an HDP Cluster

Demonstration: Understanding Block Storage (Lab)

Using HDFS Commands (Lab)

Importing RDBMS Data into HDFS (Lab)

Exporting HDFS Data to an RDBMS (Lab)

Importing Log Data into HDFS Using Flume (Lab)

Demonstration: Understanding MapReduce (Lab)

Running a MapReduce Job (Lab)

Pig Programming

Describe the Purpose of Apache Pig

Describe the Purpose of Pig Latin

Demonstrate the Use of the Grunt Shell

List Pig Latin Relation Names and Field Names

List Pig Data Types

Define a Schema

Describe the Purpose of the GROUP Operator

Describe Common Pig Operators ( ORDER BY, CASE, DISTINCT, PARALLEL, FLATTEN, FOREACH)

Perform an Inner, Outer and Replicated Join

Describe the Purpose of the DataFu Library

Demonstration: Understanding Apache Pig (Lab)

Getting Starting with Apache Pig (Lab)

Exploring Data with Apache Pig (Lab)

Splitting a Dataset (Lab)

Joining Datasets with Apache Pig (Lab)

Preparing Data for Apache Hive (Lab)

Demonstration: Computing Page Rank (Lab)

Analyzing Clickstream Data (Lab)

Analyzing Stock Market Data Using Quantiles (Lab)

Hive Programming

Describe the Purpose of Apache Hive

Describe the Differences Between Apache Hive and SQL

Describe the Apache Hive Architecture

Demonstrate How to Submit Hive Queries

Describe How to Define Tables

Describe How to Load Date Into Hive

Define Hive Partitions, Buckets and Skew

Describe How to Sort Data

List Hive Join Strategies

Describe the Purpose of HCatalog

Describe the HCatalog Ecosystem

Define a New Schema

Demonstrate the Use of HCatLoader and HCatStorer with Apache Pig

Perform a Multi-table/File Insert

Describe the Purpose of Views

Describe the Purpose of the OVER Clause

Describe the Purpose of Windows

List Hive Analytics Functions

List Hive File Formats

Describe the Purpose of Hive SerDe

Understanding Hive Tables (Lab)

Understanding Partition and Skew (Lab)

Analyzing Big Data with Apache Hive (Lab)

Demonstration: Computing NGrams (Lab)

Joining Datasets in Apache Hive (Lab)

Computing NGrams of Emails in Avro Format (Lab)

Using HCatalog with Apache Pig (Lab)

Advanced Hive Programming, Hadoop 2 and YARN

Describe the Purpose HDFS Federation

Describe the Purpose of HDFS High Availability (HA)

Describe the Purpose of the Quorum Journal Manager

Demonstrate How to Configure Automatic Failover

Describe the Purpose of YARN

List the Components of YARN

Describe the Lifecycle of a YARN Application

Describe the Purpose of a Cluster View

Describe the Purpose of Apache Slider

Describe the Origin and Purpose of Apache Spark

List Common Spark Use Cases

Describe the Differences Between Apache Spark and MapReduce

Demonstrate the Use of the Spark Shell

Describe the Purpose of an Resilient Distributed Dateset (RDD)

Demonstrate How to Load Data and Perform a Word Count

Define Lazy Evaluation

Describe How to Load Multiple Types of Data

Demonstrate How to Perform SQL Queries

Demonstrate How to Perform DataFrame Operations

Describe the Purpose of the Optimization Engine

Describe the Purpose of Apache Oozie

Describe Apache Pig Actions

Describe Apache Hive Actions

Describe MapReduce Actions

Describe How to Submit an Apache Oozie Workflow

Define an Oozie Coordinator Job

Advanced Apache Hive Programming (Lab)

Running a YARN Application (Lab)

Getting Started with Apache Spark (Lab)

Exploring Apache Spark SQL (Lab)

Defining an Apache Oozie Workflow (Lab)

Who is the instructor for this training?

The trainer for this Apache PIG & HIVE Training has extensive experience in this domain, including years of experience training & mentoring professionals.

Reviews

My outlook on training changed completely after attending SpringPeople BPC training. The content, the trainer and infrastructure at SpringPeople were top notch and perfectly in tune with the industry requirements. Regardless to say, training is now something that I look forward to to. Kudos to everyone at SpringPeople!

Shweta Priya

Sony

I attended the 3-day AngularJs training at SpringPeople. The trainer was an industry veteran with vast experience in the subject. Notably, the hands-on training, and the Q&A session stood out. Overall, I found SpringPeople a great place to learn with excellent facilities and great trainers. Would recommend SpringPeople to my colleagues and friends.

Swati Singh

I attended the training on API Design for Mulesoft. The sessions were well planned and value-laden. I benefited immensely from the hands-on experience enabled through virtual labs. I would like to specifically commend the efficiency of the support team who were always available to resolve my concerns.

Nikhil Kohli

Stryker

I attended the jQuery training batch, conducted by Mr. Vijay, an SME who did a thorough coverage of all the essentials. He took us through concepts such as jQuery animations, event handlers, plugins, and jQuery-UI by small programs, very easily. The sessions were useful and well structured. By the end of the training, I was well equipped to develop a SPA on Product Management System. Overall, the learning experience at SpringPeople was great!

Heena Rajan

Mindtree