Signup as a Tutor

As a tutor you can connect with more than a million students and grow your network.

Big Data Hadoop

No Reviews Yet

Course type: Online Instructor led Course

Platform: Skype, Teamviewer, gotomeeting.com

Course ID: 45292

Course type: Online Instructor led Course

Platform: Skype, Teamviewer, gotomeeting.com

Students Interested 0 (Seats Left 0)

No Reviews Yet

About the Course

1. Introduction to Apache Hadoop and the Hadoop Ecosystem



  • Apache Hadoop Overview

  • Data Ingestion and Storage

  • Data Processing

  • Data Analysis and Exploration

  • Other Ecosystem Tools

  • Introduction to the Hands-On Exercises


2. Apache Hadoop File Storage



  • Apache Hadoop Cluster Components

  • HDFS Architecture

  • Using HDFS


3. Distributed Processing on an Apache Hadoop Cluster



  • YARN Architecture

  • Working With YARN


4. Apache Spark Basics



  • What is Apache Spark?

  • Starting the Spark Shell

  • Using the Spark Shell

  • Getting Started with Datasets and DataFrames

  • DataFrame Operations


5. Working with DataFrames and Schemas



  • Creating DataFrames from Data Sources

  • Saving DataFrames to Data Sources

  • DataFrame Schemas

  • Eager and Lazy Execution


6. Analyzing Data with DataFrame Queries



  • Querying DataFrames Using Column Expressions

  • Grouping and Aggregation Queries

  • Joining DataFrames


7. RDD Overview



  • RDD Overview

  • RDD Data Sources

  • Creating and Saving RDDs

  • RDD Operations


8. Transforming Data with RDDs



  • Writing and Passing Transformation Functions

  • Transformation Execution

  • Converting Between RDDs and DataFrames


9. Aggregating Data with Pair RDDs



  • Key-Value Pair RDDs

  • Map-Reduce

  • Other Pair RDD Operations


10. Querying Tables and Views with Apache Spark SQL



  • Querying Tables in Spark Using SQL

  • Querying Files and Views

  • The Catalog API

  • Comparing Spark SQL, Apache Impala,and Apache Hive-on-Spark


11. Working with Datasets in Scala



  • Datasets and DataFrames

  • Creating Datasets

  • Loading and Saving Datasets

  • Dataset Operations


12. Writing, Configuring, and Running Apache Spark Applications



  • Writing a Spark Application

  • Building and Running an Application

  • Application Deployment Mode

  • The Spark Application Web UI

  • Configuring Application Properties


13. Distributed Processing



  • Review: Apache Spark on a Cluster

  • RDD Partitions

  • Example: Partitioning in Queries

  • Stages and Tasks

  • Job Execution Planning

  • Example: Catalyst Execution Plan

  • Example: RDD Execution Plan


14. Distributed Data Persistence



  • DataFrame and Dataset Persistence

  • Persistence Storage Levels

  • Viewing Persisted RDDs


15. Common Patterns in Apache Spark Data Processing



  • Common Apache Spark Use Cases

  • Iterative Algorithms in Apache Spark

  • Machine Learning

  • Example: k-means


16. Apache Spark Streaming: Introduction to DStreams



  • Apache Spark Streaming Overview

  • Example: Streaming Request Count

  • DStreams

  • Developing Streaming Applications


17. Apache Spark Streaming: Processing Multiple Batches



  • Multi-Batch Operations

  • Time Slicing

  • State Operations

  • Sliding Window Operations

  • Preview: Structured Streaming


18. Apache Spark Streaming: Data Sources



  • Streaming Data Source Overview

  • Apache Flume and Apache Kafka Data Sources

  • Example: Using a Kafka Direct Data Source

Date and Time

Not decided yet.

About the Trainer

4.75 Avg Rating

4 Reviews

5 Students

9 Courses

Having 15+ years of Experience in Big Data Hadoop.
Times internet: - Implemented Apcahe Spark with Cassandra on at Hadoop RAC server for collecting Multiple log files.
Amar Ujala: - For Hadoop cluster planning and sizing with data migration from Sql server to Cassandra.
TCS: - 3 corporate batches for Hadoop admin and Data warehousing Cassandra Mongodb (Cloudera, Hortonworks).
HCL info System: - Hadoop Cluster implementing and migration from DB2.
HCL Technologies: - Hadoop, Spark-Scala, FlumeCassandra Nosql.

Reviews

No reviews currently Be the First to Review

Discussions

Students Interested 0 (Seats Left 0)

Post your requirement and let us connect you with best possible matches for Big Data Training Post your requirement now

Enquire

Submit your enquiry for Big Data Hadoop

Please enter valid question or comment

Please enter your name.

Please enter valid Phone Number

Please enter the Pin Code.

By submitting, you agree to our Terms of use and Privacy Policy

Connect With Automation Training4U

You have reached a limit!

We only allow 20 Tutor contacts under a category. Please send us an email at support@urbanpro.com for contacting more Tutors.

You Already have an UrbanPro Account

Please Login to continue

Please Enter valid Email or Phone Number

Please Enter your Password

Please enter the OTP sent to your registered mobile number.

Please Enter valid Password or OTP

Forgot Password? Resend OTP OTP Sent

Sorry, we were not able to find a user with that username and password.

We have sent you an OTP to your register email address and registered number. Please enter OTP as Password to continue

Further Information Received

Thank you for providing more information about your requirement. You will hear back soon from the trainer

UrbanPro.com is India's largest network of most trusted tutors and institutes. Over 25 lakh students rely on UrbanPro.com, to fulfill their learning requirements across 1,000+ categories. Using UrbanPro.com, parents, and students can compare multiple Tutors and Institutes and choose the one that best suits their requirements. More than 6.5 lakh verified Tutors and Institutes are helping millions of students every day and growing their tutoring business on UrbanPro.com. Whether you are looking for a tutor to learn mathematics, a German language trainer to brush up your German language skills or an institute to upgrade your IT skills, we have got the best selection of Tutors and Training Institutes for you. Read more