Wednesday, May 18, 2016

Hadoop name node requires format on restart


One can setup the hadoop cluster in pseudo distributed mode by following Official Hadoop Documentation .However, there is a challenge that you will encounter when you try to restart the file system or name node. You will end up formatting name node every time you restart it. Here is why.

The file system is mounted in /tmp in its default configuration. So when you restart the namenode or the system, the data stored in /tmp is lost, which requires you redo all the steps you have taken so far. In order to over come this problem, one should specify a different directory for the name node. Here is how you can configure it.

All the name node details are configured in hdfs-site.xml under /hadoop-dist/etc/hadoop directory. Edit this file as below to setup a new directory for name node.

<configuration>
    <property>
        <name>dfs.name.dir</name>
        <value>file:///home/<user>/pseudo/dfs/name</value>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>file:///home/<user>/pseudo/dfs/data</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>
  

Stop the cluster, format the name node and restart. Your problem should be resolved by now.

No comments: