Master the processes and practice of data science, including machine learning and natural language processing in hortonwoks platform.

What is HDP Analyst: Data Science training about?

Master processes and practice of data science. Learn how to use apache Mahout for machine learning as you learn with our certified experts. Gain practical knowledge on how to use IPython notebook, perform data analysis with python and explore data with apache pig.

With our cloudlabs practice Gain end to end to use hadoop for data science and machine learning from data. Learn use HDFS commands and apache Mahout for Machine Learning. 

What are the objectives of HDP Analyst: Data Science training?

At the end of HDP Analyst: Data Science training, you will be able to:

  • Recognize use cases for data science
  • Describe the architecture of Hadoop and YARN
  • Describe supervised and unsupervised learning differences
  • List the six machine learning tasks
  • Use Mahout to run a machine learning algorithm on Hadoop
  • Describe the data science life cycle
  • Use Pig to transform and prepare data on Hadoop
  • Write a Python script
  • Use NumPy to analyze big data
  • Use the data structure classes in the Pandas library
  • Write a Python script that invokes SciPy machine learning
  • Describe options for running Python code on a Haoop cluster
  • Write a Pig User-Defined Function in Python
  • Use Pig streaming on Hadoop with Python script
  • Write a Python script that invokes scikit-learn
  • Use the k-nearest neighbor algorithm to predict values
  • Run a machine learning algorithm on a distributed data set
  • Describe use cases for Natural Language Processing (NLP)
  • Perform sentence segmentation on a large body of text
  • Perform part-of-speech tagging
  • Use the Natural Language Toolkit (NLTK)
  • Describe the components of a Spark application
  • Write a Spark application in Python
  • Run machine learning algorithms using Spark MLlib
  • Take data science into production
Who is HDP Analyst: Data Science training for?

  • Anyone who wants to add HDP Analyst: Data Science skills to their profile
  • Teams getting started on HDP Analyst: Data Science projects
  • What are the prerequisites for HDP Analyst: Data Science training?

    • Must have experience with at least one programming or scripting language
    • Knowledge of statistics and/or mathematics
    • Basic understanding of big data and Hadoop principles

    Course Outline

    • LABS
      • Setting Up a Development Environment
      • Using HDFS Commands
      • Using Mahout for Machine Learning
      • Getting Started with Pig
      • Exploring Data with Pig
      • Using the IPython Notebook
      • Data Analysis with Python
      • Interpolating Data Points
      • Defining a Pig UDF in Python
      • Streaming Python with Pig
      • K-Nearest Neighbor and K-Means Clustering
      • Using NLTK for Natural Language Processing
      • Classifying Text using Naive Bayes
      • Spark Programming and Spark MLlib

    Who is the instructor for this training?

    The trainer for this HDP Analyst: Data Science has extensive experience in this domain, including years of experience training & mentoring professionals.