PROVE IT !!If you know it then PROVE IT !! Skill Proficiency Test

How to view the contents of fsimage or edits file

Everyone who is familiar with HDFS knows it stores its metadata in fsimage files and that the latest changes are stored in edits files. Periodically edits files are merged into the main fsimage file. Fsimage and edits are binary files so we cannot view their contents directly. However, HDFS offers  built in utilities that can dump the contents of fsimage file in a human readable format.

The utilities we will check in this post are oiv for viewing fsimage files and oev for viewing edits files. Both are designed for offline use, so that the cluster does not have to be running in order to view the files.

First of all, we have to find the location of the fsimage file. If your’e on Cloudera platform, go to HDFS-> Configuration and choose Namenode in the left pane. Then look for parameter “NameNode data directories”. That’s where the fsimage is:

If you are on Apache Hadoop or other distribution you can just look for dfs.namenode.name.dir in hdfs-site.xml file.

Now go to that directory on the active namenode and then “cd current”. let’s see what we have inside:

There are several files here, not just one fsimage and one edits. For a more in depth explanation on this files you can check here.

Another way to get the latest fsimage file is by using the fetchImage directive:

dfs dfsadmin fetchImage <local path>

First, let’s take a look at the fsimage file.

Offline image viewer – oiv

The basic structure of oiv command is this:

hdfs oiv [options] -i -o

The most handy option is -p which determined the processor and eventually the output type. The supported output types are web, xml, delimited and fileDistribution. I encountered some documentation that mentioned ls processor but It did not work on my hdfs version.

Web

This format is the default. If you do not specify any format, oiv starts a small web server that exposes the fsimage via webhdfs. You can access it at http://:5978/webhdfs/v1/?op=liststatus

This is what I ran:

hdfs oiv i fsimage_0000000000000004588 o ~/fsimage.xml

This is what you see when you run it:

The response is in json format (http://:5978/webhdfs/v1/?op=liststatus):

XML

The command is:

hdfs oiv p XML i fsimage_0000000000000004588 o ~/fsimage.xml

And the result is a very long xml. I copied only a portion of it here so you will get the idea how it looks like:

FileDistribution

This option returns a report containing a histogram of file sizes along with some stats. The command is:

hdfs oiv p FileDistribution i fsimage_0000000000000004588 o ~/fsimage.xml

And the result is:

Delimited

This produces a delimited text file where each line describes a single directory or file. Here is the command:

hdfs oiv p Delimited delimiter , i fsimage_0000000000000004588 o ~/fsimage.csv

How it looks when running it:

Again, the result is a very long file, so this is just a part of it so you can have an idea how it looks like:

There is a project that claims to be able to output fsimage file also in json format. I did not test it but you can check it out here.

Offline edits viewer – oev

If we want to see an edits file, we should use the oev command. The syntax is similar to oiv’s but it supports fewer output types.

hdfs oiv [options] i <input edits file> o <output file>

XML

Dumps the edits file contents to a xml file. This is the default format. The command is:

hdfs oev i edits_00000000000000000130000000000000004588 o ~/edits.xml

Here is the result (just part of it since it’s very long):

You can see that there is a significant difference between fsimage file and edits file. While fsimage file contains a snapshot of the filesystem (files and directories), the edits file contains transactions, changes like “OP_MKDIR” for a mkdir command.

Stats

hdfs oev -p stats -i edits_0000000000000000013-0000000000000004588 -o ~/edits.txt

Binary

Last, and very interesting option is the binary option. It takes as an input a xml file that was created using oev with the xml option and transfers it back to binary edits file. This is very handy when you have to manually fix or change the edits file and also can be used for backup. This is the command:
hdfs oev p binary i edits.xml o edits

Final words:

Those two commands can be very useful in certain circumstances and can also give you an opportunity to see and learn how fsimage and edits files look inside. I wish that oiv also had the binary option that can construct a fsimage file from xml. Also merging those two similar commands into one command with a flag for fsimage/edits also looks to me like a good idea.

Let’s block ads! (Why?)