Hadoop Course:
Hardware Requirements: -- Systems must have at least 2 GB RAM.
Software Requirements: -- I will provide all software (Operating System also).
Contents:
Virtual box/VM Ware
- Basics & Installations
Linux
- Basics
Hadoop
- What is Hadoop?
- Why Hadoop and flow of Hadoop
- Scaling
- Distributed Framework
- Hadoop v/s RDBMS
- Brief history of Hadoop
Hadoop installation in pseudo mode
Hadoop installation in cluster mode
- Adding and removing nodes (without down time)
- Decommissioning nodes
- Block size
- Hadoop Processes ( NN, SNN, JT, DN, TT)
Common errors when running Hadoop cluster, solutions
HDFS- Hadoop distributed File System
- HDFS Design and Architecture
- HDFS Concepts
- Interacting HDFS using command line
- Dataflow
- Introduction about Blocks
- Data Replication
- Admin Commands
- Hadoop archives
Hadoop Processes
- Name node and its functionality
- Secondary name node and its functionality
- Job tracker and its functionality
- Task tracker and its functionality
- Data node and its functionality
- Resource manager and its functionality
- Node manager and its functionality
Map Reduce
- Developing Map Reduce Application
- Phases in Map Reduce Framework
- Map Reduce Input and Output Formats
- Advanced Concepts
- Combiner
- HAR
- Partitioner, sorting, shuffling
- Different phases of MapReduce programs
- Data localization
- Different unstructured data processing examples
- Image processing by using MapReduce
Joining datasets in MapReduce jobs
- Map-side join
- Reduce-Side join
Hadoop Programming Languages:-
PIG
- Introduction (Basics)
- Installation and Configuration
- Different datatypes in PIG
- Interacting HDFS using PIG
- Map Reduce Programs through PIG
- PIG Commands
- Execution mechanisms (grunt, script...)
- Loading, Filtering, Grouping, joins....
- Sample programs in PIG with Real time
Hive
- Basics (Introduction)
- Installation and Configurations
- Datatypes and operators
- HQL Commands
- Interacting HDFS using Hive
- MapReduce programs through Hive
- Joins, groups, filter......
- Sample Programs in hive with real-time
- Join vs Map Join
Impala
- Basics
- Commands
Sqoop
- Introduction to sqoop
- Installations & Configurations
- Sqoop commands
- Connect to relational database using sqoop and downloading lakhs of records to
Hadoop (in single minute)
Flume
- Basics (Introduction)
- Installation and Configurations
NOSQL Databases Concepts
- Hbase
- Basics & Installations
- commands
III. Interacting Hbase with HDF
- MongoDb
Basics & Installations
- All queries for processing data
OOZIE Introduction
Zookeeper introduction
Apache Spark
- Introduction
- Installations and configurations
- RDD , SC....
- Scala Introduction
- Interacting spark with HDFS
- Programs in Spark through Scala
Specialties:--
ETL tool (Data Warehousing BI Tools):
PDI -
- Introduction
- Creating RDBMS database
- Establishing Connection between PDI to RDMS database
- Creating data in Hadoop
- Establishing Connection between PDI to Hadoop data
- Moving data from Hadoop to RDBMS and vice versa
- Summarization
Highlights:
- Working with Apache & cloudera Hadoop
- Practical's on Hadoop cluster
- Real life use cases
- Will cover old version of Hadoop and latest version of Hadoop