true

Learn Big Data from the Best Tutors

Affordable fees
1-1 or Group class
Flexible Timings
Verified Tutors

Search in

BigDATA HADOOP Infrastructure & Services: Basic Concept

Sayan Goswami

14/04/2017 1 0

Hadoop Cluster & Processes

What is Hadoop Cluster?

Hadoop cluster is the collections of one or more than one Linux Boxes. In a Hadoop cluster there should be a single Master(Linux machine/box) machine and all the rest of machines are called Slaves(Linux machines/boxes).

Hadoop Modes to start up :

Single Mode
Pseudo distributed Mode
Fully distributed Mode (it should minimum 3 Linux box)

Single Mode actually acts as a single Master box and it generally used for testing purpose(mainly black box testing).

Fully distributed Mode requires minimum 3 machines/boxes. It is actually the Production mode of Hadoop cluster.

But for R&D(Research & Development) purpose Apache Software Org. has introduced another useful mode i.e Pseudo distributed Mode. In this mode we can achieve all the core functionalities of Fully distributed cluster. That means in this mode one single machine acts as Master & as well as Slave recursively. We will configure our own Hadoop cluster in this mode.

Hadoop Processes & Layers :

Hadoop has 5 different processes for different functionalities.

NameNode.
Secondary NameNode.
DataNode.
JobTracker.
TaskTracker.

Also Hadoop handle all the storage & analytics part in between 2 layers. These are-

HDFS Layer

NameNode

Secondary NameNode

DataNode

Application / MapReduce Layer

JobTracker

TaskTracker

Note: NameNode is called the single point of failure in Hadoop cluster.Why??? Because NameNode is the highest priority than JobTracker. Simply without data what’s the value of Application.

Hadoop Process orientation in a Cluster:

Brief functionalities of the Hadoop processes-

NameNode(NN)- NN holds the total HDFS(Hadoop Distributed File System) . That means all the HDFS metadata (that send as report by each DataNode) is stored in NameNode(NN).
Secondary NameNode(SNN)- This is process is actually kind of a housekeeper of NameNode(NN). All the activities which generated by NameNode, SNN captured those as snapshot.
DataNode(DN)- DN is responsible for data writing in HDFS. Also DN periodically sends block report to NameNode(NN)
JobTracker(JT)- Each MapReduce job distributed by this process i.e JobTracker. JT distributes the job by parallel to all of the existing TaskTracker’s(TT) with the help of NN.
TaskTracker(TT)- This process is actually executing the Job’s which are distributed by JT.

We can’t edit/modify the data in HDFS. The rule is “Write Once Read Many Times”. Yes we can append the new data but can’t edit.

1 Like 0 Dislike

Follow 0

Other Lessons for You

What is PowerPoint?

PowerPoint is a complete presentation graphics package. It gives you everything you need to produce a professional-looking presentation. PowerPoint offers word processing, outlining, drawing, graphing,...

Itech Analytic Solutions

0 0

Bigdata hadoop training institute in pune

BigData What is BigData Characterstics of BigData Problems with BigData Handling BigData • Distributed Systems Introduction to Distributed Systems Problems with Existing Distributed...

Eknowledge

0 0

Linux File System

Linux File system: Right click on Desktop and click open interminal Login to Linux system and run simple commands: Check present Working Directory: $pwd /home/cloudera/Desktop Change Directory: $cd...

Sysro Tech Solutions Pvt Ltd Sysro

0 0

Different Data File Formats in Big Data

Overview In this lesson I will be explaining the different kinds of Data File formats used in Big Data, These are widely used but unspoken of. Anyone aspiring to be a Data Engineer/Data Analyst/ML...

Raghunandana S K

0 0

HTML (Hypertext Markup Language)

HTML (Hypertext Markup Language) is the set of markup symbols or codes inserted in a file intended for display on a World Wide Web browser page. The markup tells the Web browser how to display a Web page's...

Itech Analytic Solutions

1 0

Find Big Data Training near you

Online Big Data Training

Looking for Big Data Training?

Learn from Best Tutors on UrbanPro.

Are you a Tutor or Training Institute?

Join UrbanPro Today to find students near you

Big Data Questions

Shall I learn big data analytics first or go for java and cloud computing and then hadoop?

11 Answers

Hello all, I have completed B.com, MBA fin & M and 5 yr working experience in SAP PLM 1 - Engineering...

10 Answers

Hello, I have completed B.com , MBA fin & M and 5 yr working experience in SAP PLM 1 - Engineering documentation...

9 Answers

Hello, I am Ali from Hyderabad, and I am into Telecom field as a Network Performance Engineer from the...

19 Answers

I am from computer science background. I do HTML5 and CSS but i want to learn Big data or DevOps. I am...

19 Answers

Looking for Big Data Classes?

The best tutors for Big Data Classes are on UrbanPro

Select the best Tutor
Book & Attend a Free Demo
Pay and start Learning

Migrate

Class

Subject

Chapter

Migrate to

Concept Solution

Migration Successful

Learn Big Data with the Best Tutors

The best Tutors for Big Data Classes are on UrbanPro

I am a Student I am a Tutor
Name*	Please enter your full name. Please enter institute name.
Email*	Please enter your email address.
Phone*	Please enter a valid phone number.
Location*	Please enter a pincode or area name.
City*	Please enter city name.
Category*	Please enter category.
Gender*	Male Female Please select your gender.
Email ID/ Mobile No.*	Please enter either mobile no. or email.
Enter Password*	Please enter OTP Please enter Password Sorry, this phone number is not verified, Please login with your email Id.