Signup as a Tutor

As a tutor you can connect with more than a million students and grow your network.

"BigData Hadoop Essentials" is no longer available

No Reviews Yet

Koramangala 1st Block, Bangalore

Course ID: 15285

Koramangala 1st Block, Bangalore

No Reviews Yet

About the Course

BigData means data explosion. In order to handle it, the industry needs a mixed profile today. Professionals must be aware of setting up the enviornment as well as interact and troubleshoot the issues. After taking feedback from industry, our expert panel designed a course which prepares highly skilled employable resources.

Training approach adopted by us is simple,we ensure we pick up scenarios which is derived from task commonly required at work place.We ensure sound fundamentals with critical skill generation.

Topics Covered

Understanding Big Data
Understanding Big Data
    - 3V (Volume-Variety-Velocity) characteristics
    - Structured and Unstructured Data
    - Application and use cases of Big Data
Limitations of traditional large Scale systems
How a distributed way of computing is superior (cost and scale)
Opportunities and challenges with Big Data
HDFS (The Hadoop Distributed File System)
HDFS Overview and Architecture
    - Deployment Architecture
    - Name Node, Data Node and Checkpoint Node (aka Secondary Name Node)
    - Safe mode
    - Configuration files
    - HDFS Data Flows (Read vs Write)

How HDFS addresses fault tolerance?
    - CRC Check Sum
    - Data replication
    - Rack awareness and Block placement policy
    - Small files problem

HDFS Interfaces
     - Command Line Interface
        - File System
        - Administrative
     - Web Interface

MapReduce - 1 (Theoretical Concepts)
MapReduce overview
    - Functional Programming paradigms
    - How to think in a MapReduce way?

MapReduce Architecture
    - Legacy MR vs Next Generation MapReduce (aka YARN/MRv2)
    - Slots vs Containers
    - Schedulers
    - Shuffling, Sorting
    - Hadoop Data Types
    - Input and Output Formats
    - Input Splits
    - Partitioning (Hash Partitioner vs Customer Partitioner)
    - Configuration files
    - Distributed Cache

MR Algorithm and Data Flow
    - Word Count

Alternatives to MR
    - BSP (Bulk Synchronous Parallel)
    - Adhoc querying
    - Graph Computing Engines
MapReduce - 2 (Practice)
Developing, debugging and deploying MR programs
    - Stand alone mode (in Eclipse)
    - Pseudo distributed mode (as in the Big Data VM)
    - Fully distributed mode (as in Production)

    - Old and the new MR API
    - Java Client API
    - Hadoop data types and custom Writables/WritableComparables
    - Different input and output formats
    - Saving Binary Data using SequenceFiles and Avro Files

Hadoop Streaming (developing and debugging non Java MR programs - Ruby and Python)

Optimization techniques
    - Speculative execution
    - Combiners
    - JVM Reuse
    - Compression

MR algorithms (Non-graph)
    - Sorting
     - Term Frequency – Inverse Document Frequency
    - Student Data Base
    - Max Temperature
    - Different ways of joining data
    - Word Co-Occurrence
MR algorithms (Graph)
    - PageRank
    - Inverted Index
Higher Level Abstractions for MR (Pig)
Introduction and Architecture
Different Modes of executing Pig constructs
Data Types
Dynamic invokers
Pig streaming
Pig Latin language Constructs (LOAD, STORE, DUMP, SPLIT etc)
User Defined Functions
Use Cases
Higher Level Abstractions for MR (Hive)
Introduction and Architecture
Different Modes of executing Hive queries
Metastore Implementations
HiveQL(DDL & DML Operations)
External vs Managed Tables
Partitions & Buckets
User Defined Functions
Transformations using Non Java
Use Cases

Comparison of Pig and Hive
NoSQL Databases - 1 (Theoretical Concepts)
NoSQL Concepts
    - Review of RDBMS
    - Need for NoSQL
    - Brewers CAP Theorem
    - ACID vs BASE
    - Schema on Read vs. Schema on Write
    - Different levels of consistency
    - Bloom filters

Different types of NoSQL databases
    - Key Value
    - Columnar
    - Document
    - Graph

Columnar Databases concepts

NoSQL Databases - 2 (Practice)
HBase Architecture
    - Master and the Region Server
    - Catalog tables (ROOT and META)
    - Major and Minor compaction
    - Configuration files
    - HBase vs Cassandra

Interfaces to HBase (for DDL and DML operations)
    - Java API
      - Client API
      - Filters

Who should attend

Developers,Architects and Administrators with passion to learn and enhance their skills To be Market Ready.


AnyOne with understanding of xml and basic Linux/Unix/Windows/MAC O.S.

What you need to bring

Notepad and Pen.

Key Takeaways

Understand Bigdata Challenges and Frameworks to resolve the problems.

Indepth understanding of Hadoop and Hadoop Ecosystem
Handon and practical understanding of Hadoop Recent version and commercial products like cloudera and HortonWorks.

Date and Time

Not decided yet.

About the Trainer

Avg Rating





Masters in Computer Application

Certified consultant and mentor having 15 years of expierience in BigData Hadoop,mongodb,cassendra and SOA.


No reviews currently Be the First to Review


Post your requirement and let us connect you with best possible matches for Big Data Training Post your requirement now is India's largest network of most trusted tutors and institutes. Over 25 lakh students rely on, to fulfill their learning requirements across 1,000+ categories. Using, parents, and students can compare multiple Tutors and Institutes and choose the one that best suits their requirements. More than 6.5 lakh verified Tutors and Institutes are helping millions of students every day and growing their tutoring business on Whether you are looking for a tutor to learn mathematics, a German language trainer to brush up your German language skills or an institute to upgrade your IT skills, we have got the best selection of Tutors and Training Institutes for you. Read more