true

Learn Big Data from the Best Tutors

Affordable fees
1-1 or Group class
Flexible Timings
Verified Tutors

Search in

CheckPointing Process - Hadoop

Silvia P.

26/11/2020 0 0

CHECK POINTING

Checkpointing process is one of the vital concept/activity under Hadoop. The Name node stores the metadata information in its hard disk.

We all know that metadata is the heart core of the distributed file system; if it is lost, we cannot access any files inside the file system.

The metadata physically gets stored in the machine in the form of two files

1. FSIMAGE - Snapshot of the file system at a point of time

2. EDITS FILE - Contains every transaction (creation,deletion,moving,renaming,copying ..etc of files) in the file system.

Based on HA(High Availability) in Hadoop V2, the backup of the NN's metadata will be stored in another machine called SNN(StandBy Name Node). Since different clients very frequently access metadata for reading other files, instead of keeping it in the hard disk, it is good to store it in the RAM, so that it can be accessed faster.

But Stop...What happens if the machine goes down. :(. We will lose everything in the RAM. Hence taking a backup of the data stored in the RAM is a viable option.

FSIMAGE0 -- Represents the fsimage file at a particular time
FSIMAGE1 -- Represents the copy of the FSIMAGE0 file, taken as a backup.
Let's imagine the backup of the file has to be taken for every 6 hours if something goes wrong in the cluster, and the machine gets down before taking the backup, i.e. before 6 hours, then we end up in losing the latest fsimage file.
So to overcome this problem, a new system has to be exclusively added in the cluster for doing the process of safeguarding the metadata efficiently, and that process is called CheckPointing Process.

Please have a look at the picture and let's understand the process step by step.

STEP 1

A copy of the Metadata(Fsimage and Edits file) from NameNode will be taken and placed inside the Secondary name node(SNN).

STEP 2

Once the copy is placed in SNN, the Edits file which captures every single transaction happening in the file system will be merged with the fsimage file (Snapshot of the filesystem). The combined result will give the updated or latest file system.

STEP 3

The latest merged Fsimage will be moved to the NN's metadata location.

STEP 4

During the process of merging also, some of the files may be deleted or created or copied some transactions could have happened and those details will be stored in a new file called Edits.new, because the original Edits file has been opened/utilized for copying into the SNN, remember the deadlock principle.

STEP 5

Now the Edits.new file will become the latest Edits file, and the Merged fsimage will become the original fsimage file. This process will be continued for a specific interval.

So, now no more backup's are needed to save the metadata in NN in case of failover scenarios.

Will see more details and programs in the upcoming lessons.

Thank you!!

0 Like 0 Dislike

Follow 2

Other Lessons for You

An Introduction to Business Intelligence Concepts

Looking for a Business Intelligence (BI) solution for your company can be intimidating. BI uses its own special terminology and the database design concepts can be difficult to grasp. So where do you...

Itech Analytic Solutions

0 0

What is Big Data and Why Do Organizations Need It?

Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s...

Itech Analytic Solutions

0 0

Big Data Hadoop Basic Tutorial For Beginners

Hadoop Basics for Admin and developers Hadoop is a framework used for storing and processing huge data sets. By huge data I mean Big Data. Big data is any data that cannot be handled by traditional RDBMS....

Rahul R

0 0

What is a SQL join?

A SQL join is a Structured Query Language (SQL) instruction to combine data from two sets of data (e.g. two tables). Before we dive into the details of a SQL join, let’s briefly discuss what SQL...

Itech Analytic Solutions

0 0

What Is Power Query?

Power Query is an Excel add-in that can be used for data discovery, reshaping the data and combining data coming from different sources. Power Query is one of the Excel add-ins provided as part of Microsoft...

Itech Analytic Solutions

0 0

Find Big Data Training near you

Online Big Data Training

Looking for Big Data Training?

Learn from Best Tutors on UrbanPro.

Are you a Tutor or Training Institute?

Join UrbanPro Today to find students near you

Big Data Questions

Hi, What is opinion on Big data analytics for MBA graduates who doesn't know coding. Please suggest. Is it Coding related course.

14 Answers

I have done my PGDITM(POST GRADUATION DIPLOMA IN Information Technology MANAGEMENT) with FINANCIAL SYSTEMS...

6 Answers

Hello all, I have completed B.com, MBA fin & M and 5 yr working experience in SAP PLM 1 - Engineering...

10 Answers

What should be the fees for Online weekend Big Data Classes. All stack Hadoop, Spark, Pig, Hive , Sqoop,...

11 Answers

Which is better to learn, Apache Spark or Apache Flink?

8 Answers

Looking for Big Data Classes?

The best tutors for Big Data Classes are on UrbanPro

Select the best Tutor
Book & Attend a Free Demo
Pay and start Learning

Migrate

Class

Subject

Chapter

Migrate to

Concept Solution

Migration Successful

Learn Big Data with the Best Tutors

The best Tutors for Big Data Classes are on UrbanPro

I am a Student I am a Tutor
Name*	Please enter your full name. Please enter institute name.
Email*	Please enter your email address.
Phone*	Please enter a valid phone number.
Location*	Please enter a pincode or area name.
City*	Please enter city name.
Category*	Please enter category.
Gender*	Male Female Please select your gender.
Email ID/ Mobile No.*	Please enter either mobile no. or email.
Enter Password*	Please enter OTP Please enter Password Sorry, this phone number is not verified, Please login with your email Id.