Big Data Analysis using Hadoop Training

Live Online & Classroom Enterprise Training

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem

Looking for a private batch ?

REQUEST A CALLBACK
Key Features
  • Lifetime Access

  • CloudLabs

  • 24x7 Support

  • Real-time code analysis and feedback

  • 100% Money Back Guarantee

PDP BG 1
SpringPeople Logo

What is Big Data Analysis using Hadoop about?

Through instructor-led discussion and interactive hands-on exercises participants will navigate the Hadoop ecosystem

What are the objectives of Big Data Analysis using Hadoop ?

At the end of Big Data Analytics on Hadoop training course, the participants will learn:

  • The features that Pig, Hive, Flume, Sqoop and Oozie offer for data acquisition, storage, and analysis
  • The fundamentals of Apache Hadoop and data ETL (extract, transform, load), ingestion, and processing with Hadoop
  • Joining diverse datasets to gain valuable business insight
  • Performing real-time, complex queries on datasets
  • Introduction to Apache Spark, Spark SQL and DataFrames for performing data analysis.
Available Training Modes

Live Online Training

12 Hours

Classroom Training

 

2 Days
PDP BG 2

Who is Big Data Analysis using Hadoop for?

  • Anyone who wants to add Big Data Analysis using Hadoop Training skills to their profile
  • Teams getting started on Big Data Analysis using Hadoop Training projects
  • What are the prerequisites for Big Data Analysis using Hadoop?

    NA

    Course Outline

    • Hadoop Fundamentals
      • What is Big Data and role of Hadoop in Big Data?
      • The Motivation for Hadoop and use cases.
      • Hadoop 2.0 Overview
      • Distributed Data Processing: YARN
      • Hadoop vs RDBMS
      • Hadoop Ecosystem
      • - Data Processing and Analysis: Pig, Hive, Spark
      • - Data Integration: Sqoop
      • - Streaming Analysis: Flume
      • - Workflow: Oozie
    • Data Storage: HDFS
      • HDFS components (Blocks, Name Node, Data Node)
      • HDFS High Availability
      • Important HDFS commands
      • Anatomy of File Read and Write
      • Lab: HDFS commands
    • Input Data into HDS
      • Using Hadoop client
      • Web HDFS
      • Using Sqoop (data transfer between RDBMS and Hadoop)
      • Flume (extract streaming data)
      • Lab : using Sqoop to export and import data
      • Lab: use Flume to extract streaming data.
    • MapReduce : Analysing Data with Hadoop
      • Understanding Map and Reduce concepts
      • WordCount MapReduce Program
      • Lab: run a MapReduce program from YARN
    • Introduction to Apache Pig
      • What Is Apache Pig? its features and use cases
      • Interacting with Pig : Pig Latin and Grunt shell
      • Running Pig : Local Mode, MapReduce Mode
      • Lab: Invoking Pig
    • Basic Data Analysis with Pig
      • Pig Latin syntax and data types
      • Defining and viewing the schemas
      • Loading and storing data.
      • Grouping, filtering and sorting data
      • Lab: basic data analysis using pig
    • Advanced Data analysis using Pig
      • Using operators : FOREACH, NESTED FOREACH, CASE, FLATTEN, PARALLEL
      • Frequently used built in functions.
      • Lab: splitting data sets.
      • Joining data sets : performing inner joins, outer joins, right outer joins, left outer joins, replicated joins, COGROUP
      • Lab: joining data sets
      • Pig User Defined function. An UDF example. How to invoke a UDF?
      • Pig scripts and parameter substitution
      • Lab: analyse unstructured data using Pig
      • advance data analysis using Pig
      • Tips for optimizing performance of Pig jobs
    • Introduction to Hive
      • What is Hive? Hive Query Language (HQL) versus SQL
      • Hive Architecture
      • Hive QL syntax and data types
      • Invoking Hive, Hive Shell, submitting Hive queries
    • Hive Data Management
      • Creating Hive databases and Tables.
      • - Managed Tables and External tables
      • Different ways of loading data into Hive table
      • Simplifying queries using Views
      • Storing query results to a file
      • Lab: create and load data into Hive tables. Query Hive tables.
    • Hive Data Storage
      • Hive partitions, buckets and skewed tables
      • Hive File Formats : SerDe, ORC, sequential
      • Sorting Data : ORDER BY and SORT BY
      • Lab: analyse big data using Hive
    • Hive Data Analysis
      • Hive Joins : Inner joins and Outer Join
      • Commonly used Hive Built in Functions
      • Hive user defined function.
      • Using Aggregation and Windowing. PARTITION BY clause
      • Analytical functions : RANK, DENSE RANK
      • Lab: advance Hive programming
    • Hive Performance Optimization
      • Hive CBO, computing column and table statistics
      • Tips for Hive Performance Optimization
    • Hive metadata integration with Pig
      • About Hcatalog
      • Hcatalog in the Hadoop ecosystem
      • Using HCatloader to load data into Pig relation from Hive table
      • Using HCatstorer to store data from Pig into Hive table.
      • Lab : using HCatalog with Pig.
    • Oozie for Scheduling jobs
      • Oozie components : Actions, Fork, Join Nodes, Workflow, Coordinator
      • Submit a Oozie Workflow
      • Lab: create a oozie workflow to run Pig and Hive job.
    • Introduction to Apache Spark
      • What is Apache Spark? Spark Origin
      • Spark Ecosystem
      • Spark use cases
      • Spark versus MapReduce
      • What is Spark context?
      • Understanding RDD.
      • Create an RDD
      • Spark Operations - Transformations and Actions
      • Examples of Actions and Transformations.
      • Spark WordCount program using Python
      • Lab : getting started with Spark
    • Spark SQL and DataFrames
      • Spark SQL overview
      • DataFrames overview
      • SQLContext and HiveContext
      • Performing Spark SQL queries
      • Performing DataFrame operations
      • Lab: Spark SQL and DataFrame exercise to perform data analysis.

    Who is the instructor for this training?

    The trainer for this Big Data Analysis using Hadoop Training has extensive experience in this domain, including years of experience training & mentoring professionals.

    Reviews