Understanding Big Data and Hadoop Learning Objectives: In this module, you will understand what Big Data is, the limitations of the traditional solutions for Big Data problems, how Hadoop solves those Big Data problems, Hadoop Ecosystem, Hadoop Architecture, HDFS, Anatomy of File Read and Write & how MapReduce works. Topics: Introduction to Big Data & Big Data Challenges Limitations & Solutions of Big Data Architecture Hadoop & its Features Hadoop Ecosystem Hadoop 2.x Core Components Hadoop Storage: HDFS (Hadoop Distributed File System) Hadoop Processing: MapReduce Framework Different Hadoop Distributions Hadoop Architecture and HDFS Learning Objectives: In this module, you will learn Hadoop Cluster Architecture, important configuration files of Hadoop Cluster, Data Loading Techniques using Sqoop& Flume, and how to setup Single Node and Multi-Node Hadoop Cluster. Topics: Hadoop 2.x Cluster Architecture Federation and High Availability Architecture. Typical Production Hadoop Cluster Hadoop Cluster Modes Common Hadoop Shell Commands Hadoop 2.x Configuration Files Single Node Cluster & Multi-Node Cluster set up Basic Hadoop Administration Hadoop MapReduce Framework Learning Objectives: In this module, you will understand Hadoop MapReduce framework comprehensively, the working of MapReduce on data stored in HDFS. You will also learn the advanced MapReduce concepts like Input Splits, Combiner &Partitioner. Topics: Traditional way vsMapReduce way Why MapReduce YARN Components YARN Architecture YARN MapReduce Application Execution Flow YARN Workflow Anatomy of MapReduce Program Input Splits, Relation between Input Splits and HDFS Blocks MapReduce: Combiner &Partitioner Demo of Health Care Dataset Demo of Weather Dataset Apache Hive Learning Objectives: This module will help you in understanding Hive concepts, Hive Data types, loading and querying data in Hive, running hive scripts and Hive UDF. Topics: Introduction to Apache Hive Hive vs Pig Hive Architecture and Components Hive Metastore Limitations of Hive Comparison with Traditional Database Hive Data Types and Data Models Hive Partition Hive Bucketing Hive Tables (Managed Tables and External Tables) Importing Data Querying Data & Managing Outputs Hive Script & Hive UDF Retail use case in Hive Hive Demo on Healthcare Dataset Processing Distributed Data with Apache Spark Learning Objectives: In this module, you will learn what is Apache Spark, SparkContext& Spark Ecosystem. You will learn how to work in Resilient Distributed Datasets (RDD) in Apache Spark. You will be running application on Spark Cluster & comparing the performance of MapReduce and Spark. Topics: What is Spark Spark Ecosystem Spark Components What is Scala Why Scala SparkContext Spark RDD