PROVE IT !!If you know it then PROVE IT !! Skill Proficiency Test

What happens if a HDFS block is deleted directly from dataNode ?

Lately I wondered what happens if I login into one of the data nodes and delete a HDFS block directly from the filesystem, not via hdfs interface ?

If we have a replica factor of 3, then 2 other copies of this block are still available so Hadoop can:

  1. Keep serving this block to requestors.
  2. Recover the missing block from it’s replicas.

But does Hadoop really behaves this way ?

According to the documentation, for example here , under-replicated blocks are automatically replicated by HDFS until the reach the desired replication factor.

I wanted to see it happening, so I did a little test. I used Cloudera CDH 5.14.1 for this test.

First of all let’s create a demo file:

[[email protected] ~]$ echo "This is a demo file for HDFS" >> demo.txt
[[email protected] ~]$ ls -l
total 8
-rw-rw-r-- 1 hdfs hdfs 29 May 7 22:44 demo.txt
-rwxrwxrwx 1 root root 51 Apr 16 22:45 test.cfg
 
[[email protected] ~]$ hdfs dfs -put demo.txt /tmp/demo.txt
[[email protected] ~]$ hdfs dfs -ls /tmp
Found 5 items
drwxrwxrwx - hdfs supergroup 0 2017-05-07 22:45 /tmp/.cloudera_health_monitoring_canary_files
-rw-r--r-- 3 hdfs supergroup 29 2017-05-07 22:45 /tmp/demo.txt
drwxr-xr-x - yarn supergroup 0 2017-04-14 23:03 /tmp/hadoop-yarn
drwx-wx-wx - hive supergroup 0 2017-03-26 23:31 /tmp/hive
drwxrwxrwt - mapred hadoop 0 2017-04-14 23:03 /tmp/logs

We will now find the datanodes this block is present on. You can use this post for gidence how to do it. Here is the screenshot showing that the block is present on cloudera2, cloudera3 and cloudera4:

Now go to HDFS configuration and find the data directory:

View full size image

Now login to cloudera2, and go to /dfs/dn/current/BP-897318959-192.168.1.102-1528154207418/current/finalized/subdir0 (the subdirectories may vary from cluster to cluster. Just go to /dfs/dn and dig for your file). There were Four sub directories there and I had to enter each one of them and run “grep demo *” to find the block that I want (it was in subdir3). Then I just deleted it:

[[email protected] subdir3]# cd /dfs/dn/current/BP-897318959-192.168.1.102-1528154207418/current/finalized/subdir0/subdir3
[[email protected] subdir3]# grep demo *
blk_1073742596:This is a demo file for HDFS
[[email protected] subdir3]# ls
blk_1073742596 blk_1073742596_1772.meta
[[email protected] subdir3]# rm *
rm: remove regular file ‘blk_1073742596’? y
rm: remove regular file ‘blk_1073742596_1772.meta’? y

This was a rough deletion of a block. I did not use hdfs commands to delete a file so the metastore still thinks the file is present.

Then I connected to another datanode, cloudera4 and tried to retrieve the file. It worked perfectly:

[[email protected] ~]# hdfs dfs -ls /tmp
Found 4 items
d--------- - hdfs supergroup 0 2018-06-07 01:15 /tmp/.cloudera_health_monitoring_canary_files
-rw-r--r-- 3 root supergroup 29 2018-06-07 00:51 /tmp/demo.txt
drwx-wx-wx - hive supergroup 0 2018-06-05 02:25 /tmp/hive
drwxrwxrwt - mapred hadoop 0 2018-06-05 02:20 /tmp/logs
[[email protected] ~]# hdfs dfs -copyToLocal /tmp/demo.txt .
[[email protected] ~]# cat demo.txt
This is a demo file for HDFS

I tried to do the same from cloudera2 (the node where I deleted the file locally) and got this exception:

18/06/07 01:22:27 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader.
java.io.IOException: Got error for OP_READ_BLOCK, status=ERROR, self=/192.168.1.213:48832, remote=/192.168.1.213:50010, for file /tmp/demo.txt, for pool BP-897318959-192.168.1.102-1528154207418 block 1073742596_1772
at org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:467)
at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:432)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:890)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:768)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:377)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:660)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:897)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:956)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:121)
at org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:472)
at org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:397)
at org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:334)
at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:269)
at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:254)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
at org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:249)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:220)
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
18/06/07 01:22:27 WARN hdfs.DFSClient: Failed to connect to /192.168.1.213:50010 for block, add to deadNodes and continue. java.io.IOException: Got error for OP_READ_BLOCK, status=ERROR, self=/192.168.1.213:48832, remote=/192.168.1.213:50010, for file /tmp/demo.txt, for pool BP-897318959-192.168.1.102-1528154207418 block 1073742596_1772
java.io.IOException: Got error for OP_READ_BLOCK, status=ERROR, self=/192.168.1.213:48832, remote=/192.168.1.213:50010, for file /tmp/demo.txt, for pool BP-897318959-192.168.1.102-1528154207418 block 1073742596_1772
at org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:467)
at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:432)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:890)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:768)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:377)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:660)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:897)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:956)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:121)
at org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:472)
at org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:397)
at org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:334)
at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:269)
at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:254)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
at org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:249)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:220)
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
18/06/07 01:22:27 INFO hdfs.DFSClient: Successfully connected to /192.168.1.226:50010 for BP-897318959-192.168.1.102-1528154207418:blk_1073742596_1772

At first it attempts to retrieve the block from the local node and fails, bat eventually it connects to another node and manages to find the block.
Despite the Exception it succeeded at retrieving the file and it was created in my filesystem and contained the right data.

Recovering the lost block

I actually created an under replicated block. There are several ways to handle such a situation, but somehow most of them did not work for me.

  • First, I tried running fsck command:
[[email protected] ~]# hdfs fsck /tmp/demo.txt
Connecting to namenode via http://cloudera1.lan:50070/fsck?ugi=root&path=%2Ftmp%2Fdemo.txt
FSCK started by root (auth:SIMPLE) from /192.168.1.213 for path /tmp/demo.txt at Thu Jun 07 01:25:20 IDT 2018
.Status: HEALTHY
Total size: 29 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 29 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Thu Jun 07 01:25:20 IDT 2018 in 3 milliseconds

The filesystem under path '/tmp/demo.txt' is HEALTHY

It reported the filesystem to be healthy and even did not detect any under replicated blocks, but the file was still missing from local filesystem in cloudera2. There is an option in fsck to just delete corrupt files, but I wanted to recover the file, not delete it (knowing that there are good replicas of it on other nodes).

  • Then I tried another approach:
    I ran the command “hdfs namenode -recover -force”, and restarted the affected datanode. It finished successfully but still the file was not there…
  • I ran a re-balance on HDFS.  While doing the re-balance, the file appeared in the local directory for a few seconds but then disappeared. HDFS was moving blocks around but for some reason decided not to leave any blocks in cloudera2. Strange.
  • Another strange thing happened when I shut down one of the nodes that had the remaining replicas. As soon as I shut it down, the block was replicated to cloudera2. I was happy and thought that I found a way to recover the block. But when I started the node I stopped earlier, the block was gone from cloudera2. This is a very strange behavior of hadoop. It seems irrational. If you already replicated the block to a node where it was missing, why deleting it ?
  • I then tried: hdfs dfs setrep 3 /tmp/demo.txt
    This also finished wthout errors, but did not bring back the file.

The only thing that finally worked was to restart all HDFS processes, after that the file was replicated correctly and stayed there. It is not a good way because sometimes you don’t want to stop a busy cluster and fail many jobs just to recover a couple of blocks.

Even though the documentatin says that under-replicated blocks should recover automatically, I did not see such behavior. I can think of Two possible reasons for this:

  • It may be a bug in this version of Hadoop. I should run the same test on other platforms such as Hortonworks HDP.
  • I did not wait long enough for the replication to happen. I doubt this one since I waited about 15 minutes, which seems enough time for HDFS to detect a missing block.

I ran the same test on HDP 2.6.5 and got the same results. The system does not detect any under-replicated blocks and does not recover the lost replica until HDFS is restarted.

View full size image

So this looks like a problem/bug or a documentation inconsistency in Hadoop itself.

Let’s block ads! (Why?)