Find the best tutors and institutes for Big Data

Find Best Big Data Training

Please select a Category.

Please select a Locality.

No matching category found.

No matching Locality found.

Outside India?

Search for topics

CheckPointing Process - Hadoop

Silvia Priya
29 Mar 0 0

 

 

 

 

 

 

Check pointing process is one of the important concept/activity under Hadoop. The Name node stores the metadata information in it's hard disk.

We all know that metadata is the heart core of the distributed file system, it if is lost we cannot access any files inside the file system.

 

The metadata physically gets stored in the machine in the form of two files

1. FSIMAGE - Snapshot of the file system at a point of time

2. EDITS FILE - Contains every transaction (creation,deletion,moving,renaming,copying ..etc of files)  in the file system.

Based on HA(High Availability) in Hadoop V2, the backup of the NN's metadata will be stored in a another machine called SNN(StandBy Name Node). Since metadata is very frequently accessed by different clients for reading different files, instead of storing it in the hard disk, it is good to store it in the RAM, so that it can be accessed faster.

But Stop...What happens if the machine goes down.. :(. We will loose everything in the RAM. Hence taking a backup of the data stored in the RAM is a viable option.

 

 

FSIMAGE0 -- Represents the fsimage file at a particular time

FSIMAGE1 -- Represents the copy of the FSIMAGE0 file, taken as a backup.

Let's imagine the backup of the file has to be taken for every 6 hours, if suppose something goes wrong in the cluster and the machine gets down before taking the backup i.e before 6 hours, then we end up in losing the latest fsimage file.

 

So to overcome this problem, a new system has to be exclusively added  in the cluster for doing the process of safeguarding the metadata in a efficient way and that process is called CheckPointing Process.

 

 

Have a look on the picture and let's understand the process step by step.

STEP 1

A copy of the Metadata(Fsimage and Edits file) from NameNode will be taken and placed inside the Secondary name node(SNN).

STEP 2

Once the copy is placed in SNN , the Edits file which captures every single transaction happening in the file system will be merged with the Fsimage file (Snap shot of the filesystem). The merged result will give the updated or latest file system.

STEP 3

The latest merged Fsimage will be moved to the NN's metadata location.

STEP 4

During the process of merging also, some of the files may be deleted or created or copied basically some transactions could have happened and those details will be stored in a new file called Edits.new , because the original Edits file is been opened/utilized for copying into the SNN, remember the deadlock principle.

STEP 5

Now the Edits.new file will become the latest Edits file and the Merged fsimage will become the original fsimage file. This process will be continued for a specific interval.

So, now no more backup's are needed to save the metadata in NN in case of failover scenarios.

Will see more details and programs in the upcoming lessons.

Thank you!!

0 Dislike
Follow 1

Please Enter a comment

Submit

Other Lessons for You

What is M.S.Project ?
MICROSOFT PROJECT contains project work and project groups, schedules and finances.Microsoft Project permits its users to line realistic goals for project groups and customers by making schedules, distributing...

Big Data & Hadoop - Introductory Session - Data Science for Everyone
Data Science for Everyone An introductory video lesson on Big Data, the need, necessity, evolution and contributing factors. This is presented by Skill Sigma as part of the "Data Science for Everyone" series.

Skill Sigma | 21/12/2018

0 0
0

Why is the Hadoop essential?
Capacity to store and process large measures of any information, rapidly. With information volumes and assortments always expanding, particularly from web-based life and the Internet of Things (IoT), that...

Use of Piggybank and Registration in Pig
What is a Piggybank? Piggybank is a jar and its a collection of user contributed UDF’s that is released along with Pig. These are not included in the Pig JAR, so we have to register them manually...

Sachin Patil | 16/09/2018

0 0
0

How to create UDF (User Defined Function) in Hive
1. User Defined Function (UDF) in Hive using Java. 2. Download hive-0.4.1.jar and add it to lib-> Buil Path -> Add jar to libraries 3. Q:Find the Cube of number passed: import org.apache.hadoop.hive.ql.exec.UDF; public...

Sachin Patil | 16/09/2018

0 0
0

Find Best Big Data Training?

Find Now »

UrbanPro.com is India's largest network of most trusted tutors and institutes. Over 25 lakh students rely on UrbanPro.com, to fulfill their learning requirements across 1,000+ categories. Using UrbanPro.com, parents, and students can compare multiple Tutors and Institutes and choose the one that best suits their requirements. More than 6.5 lakh verified Tutors and Institutes are helping millions of students every day and growing their tutoring business on UrbanPro.com. Whether you are looking for a tutor to learn mathematics, a German language trainer to brush up your German language skills or an institute to upgrade your IT skills, we have got the best selection of Tutors and Training Institutes for you. Read more