Cloudera Developer for Spark & Hadoop Certification Training in Gurgaon - Xebia Academy

LIVE

Ongoing

Course offered by Anupama Jha

0 review

overview batches reviews

Xebia offers Cloudera Developer for Spark & Hadoop Certification Training in Gurgaon, a four-day hands-on training course delivers the key concepts and expertise participants need to ingest and process data on a Hadoop cluster using the most up-to-date tools and techniques. Employing Hadoop ecosystem projects such as Spark, Hive, Flume, Sqoop, and Impala, this training course is the best preparation for the real-world challenges faced by Hadoop developers. Participants learn to identify which tool is the right one to use in a given situation, and will gain hands-on experience in developing using those tools.

Learn how to import data into your Apache Hadoop cluster and process it with Spark, Hive, Flume, Sqoop, Impala, and other Hadoop ecosystem tools.

Training Date: 23rd Aug to 26th Aug,2018
Course Fee: Rs 69,999.00

Course Curriculum:

Introduction to Apache Hadoop and the Hadoop Ecosystem

Introduction to Apache Hadoop and the Hadoop Ecosystem
Apache Hadoop Overview
Data Ingestion and Storage
Data Processing
Data Analysis and Exploration
Other Ecosystem Tools
Introduction to the Hands-On Exercises

Apache Hadoop File Storage

Apache Hadoop Cluster Components
HDFS Architecture
Using HDFS

Distributed Processing on an Apache Hadoop Cluster

YARN Architecture
Working With YARN

Apache Spark Basics

What is Apache Spark?
Starting the Spark Shell
Using the Spark Shell
Getting Started with Datasets and DataFrames
DataFrame Operations

Working with DataFrames and Schemas

Creating DataFrames from Data Sources
Saving DataFrames to Data Sources
DataFrame Schemas
Eager and Lazy Execution

Analyzing Data with DataFrame Queries

Querying DataFrames Using Column Expressions
Grouping and Aggregation Queries
Joining DataFrames

RDD Overview

RDD Overview
RDD Data Sources
Creating and Saving RDDs
RDD Operations

Transforming Data with RDDs

Writing and Passing Transformation Functions
Transformation Execution
Converting Between RDDs and DataFrames

Aggregating Data with Pair RDDs

Key-Value Pair RDDs
Map-Reduce
Other Pair RDD Operations

Querying Tables and Views with Apache Spark SQL

Querying Tables in Spark Using SQL
Querying Files and Views
The Catalog API
Comparing Spark SQL, Apache Impala, and Apache Hive-on-Spark

Working with Datasets in Scala

Datasets and DataFrames
Creating Datasets
Loading and Saving Datasets
Dataset Operations

Writing, Configuring, and Running Apache Spark Applications

Writing a Spark Application
Building and Running an Application
Application Deployment Mode
The Spark Application Web UI
Configuring Application Properties

Distributed Processing

Review: Apache Spark on a Cluster
RDD Partitions
Example: Partitioning in Queries
Stages and Tasks
Job Execution Planning
Example: Catalyst Execution Plan
Example: RDD Execution Plan

Distributed Data Persistence

DataFrame and Dataset Persistence
Persistence Storage Levels
Viewing Persisted RDDs

Common Patterns in Apache Spark Data Processing

Common Apache Spark Use Cases
Iterative Algorithms in Apache Spark
Machine Learning
Example: k-means

Apache Spark Streaming: Introduction to DStreams

Apache Spark Streaming Overview
Example: Streaming Request Count
DStreams
Developing Streaming Applications

Apache Spark Streaming: Processing Multiple Batches

Multi-Batch Operations
Time Slicing
State Operations
Sliding Window Operations
Preview: Structured Streaming

Apache Spark Streaming: Data Sources

Streaming Data Source Overview
Apache Flume and Apache Kafka Data Sources
Example: Using a Kafka Direct Data Source

About the Trainer

Avg Rating

0 Reviews

0 Students

8 Courses

Anupama Jha

M.SC

3 Years of Experience

Our individual training programs and learning resources covers comprehensive professional development course portfolio. We help individuals and teams to get required skills they need to succeed in the digital economy. Our tailor-made training programs cover entire gamut of domains, emerging technologies like Agile frameworks, Web Apps, Mobile Apps, Big Data, Cloud, Microsoft, Full Stack Development and Program Testing.

Our training programs are comprehensive and flexible to fit- in your area of interest and expertise. It enables you and your team to increase productivity, improve business outcomes and support greater business agility.