Cloudera - Spark and Hadoop Developer Training

Live Online & Classroom Enterprise Certification Training

Excel yourself in importing data to your Apache Hadoop cluster and process it with Spark, Hive, Flume, Sqoop, Impala, and other Hadoop ecosystem tools with this Spark and Hadoop training and certification course.

Looking for a private batch ?

REQUEST A CALLBACK
Key Features
  • Lifetime Access

  • CloudLabs

  • 24x7 Support

  • Real-time code analysis and feedback

  • 100% Money Back Guarantee

PDP BG 1
SpringPeople Logo

What is Cloudera - Spark and Hadoop Developer training about?

This four-day hands-on training course delivers the key concepts. Participants of this course will learn to ingest and process data on a Hadoop cluster using the most up-to-date tools and techniques. Employing Hadoop ecosystem projects such as Spark, Hive, Flume, Sqoop and Impala, this training course prepares you for the real-world challenges as a Hadoop developer. They will also learn to identify the right tool to use in a given situation and will gain hands-on experience in developing using those tools.

Audience 
This course is designed for developers and engineers who have programming experience. 

What are the objectives of Cloudera - Spark and Hadoop Developer training?

After the completion of this course, you will be able to:

  • Understand Hadoop, HDFS, Hadoop architecture
  • Learn Hive, Sqoop, Impala
  • Learn Spark
  • Explore RDDs, Spark SQL etc.
Available Training Modes

Live Online Training

Classroom Training

 

PDP BG 2

Who is Cloudera - Spark and Hadoop Developer training for?

  • Anyone who wants to add Cloudera - Spark and Hadoop Developer skills to their profile
  • Teams getting started on Cloudera - Spark and Hadoop Developer projects
  • What are the prerequisites for Cloudera - Spark and Hadoop Developer training?

    This course requires Java and Unix as prerequsites for learners.

    Course Outline

    • Introduction to Hadoop and the Hadoop Ecosystem
      • Challenge with Traditional Large-Scale Systems
      • Hadoop!
      • Data Storage and Ingest
      • Data Processing
      • Data Analysis and Exploration
      • Other Ecosystem Tools
      • Introduction to the Hands-On Exercises
    • Hadoop Architecture and HDFS
      • Distributed Processing on a Cluster
      • Storage: HDFS Architecture
      • Storage: Using HDFS
      • Resource Management: YARN Architecture
      • Resource Management: Working with YARN
    • Importing Relational Data with Apache Sqoop
      • Sqoop Overview
      • Basic Imports and Exports
      • Limiting Results
      • Improving Sqoop's Performance
      • Sqoop 2
    • Introduction to Impala and Hive
      • Introduction to Impala and Hive
      • Why Use Impala and Hive?
      • Querying Data With Impala and Hive
      • Comparing Hive and Impala to Traditional Databases
    • Modeling and Managing Data with Impala and Hive
      • Data Storage Overview
      • Creating Databases and Tables
      • Loading Data into Tables
      • HCatalog
      • Impala Metadata Caching
    • Data Formats
      • Selecting a File Format
      • Hadoop Tool Support for File Formats
      • Avro Schemas
      • Using Avro with Impala, Hive, and Sqoop
      • Avro Schema Evolution
      • Compressio
    • Data File Partitioning
      • Partitioning Overview
      • Partitioning in Impala and Hive
    • Capturing Data with Apache Flume
      • What is Apache Flume?
      • Basic Flume Architecture
      • Flume Sources
      • Flume Sinks
      • Flume Channels
      • Flume Configuratio
    • Spark Basics
      • What is Apache Spark?
      • Using the Spark Shell
      • RDDs (Resilient Distributed Datasets)
      • Functional Programming in Spark
    • Working with RDDs in Spark
      • Creating RDDs
      • Other General RDD Operations
    • Writing and Deploying Spark Applications
      • Spark Applications vs. Spark Shell
      • Creating the SparkContext
      • Building a Spark Application (Scala and Java)
      • Running a Spark Application
      • The Spark Application Web UI
      • Configuring Spark Properties
      • Logging
    • Parallel Processing in Spark
      • Review: Spark on a Cluster
      • RDD Partitions
      • Partitioning of File-Based RDDs
      • HDFS and Data Locality
      • Executing Parallel Operations
      • Stages and Tasks
    • Spark RDD Persistence
      • RDD Lineage
      • RDD Persistence Overview
      • Distributed Persistence
    • Common Patterns in Spark Data Processing
      • Common Spark Use Cases
      • Iterative Algorithms in Spark
      • Graph Processing and Analysis
      • Machine Learning
      • Example: k-means
    • DataFrames and Spark SQL
      • Spark SQL and the SQL Context
      • Creating DataFrames
      • Transforming and Querying DataFrames
      • Saving DataFrames
      • DataFrames and RDDs
      • Comparing Spark SQL, Impala, and Hive-on-Spark

    Who is the instructor for this training?

    The trainer for this Cloudera - Spark and Hadoop Developer has extensive experience in this domain, including years of experience training & mentoring professionals.

    Cloudera - Spark and Hadoop Developer - Certification & Exam

    Reviews