Hadoop Single Node Cluster Installation in Ubuntu

    1 Votes

This step by step tutorial on Hadoop Single Node Cluster installation will help you install, run and verify Hadoop Installation in Ubuntu machines. The Prerequisites required for installation are

  • Sun Java 6,7 or latest version 
  • Ubuntu Linux 12.04 (LTS version)
  • Hadoop 1.0.3 ( Stable version)

 

Steps by Step procedure to Install Hadoop

Step: 1

At first, we need to install Java. If your machine has a Java installation, check whether the Java version is correct. To install Java, enter command given below into the terminal
sudo apt-get install openjdk-7-jdk
Once Java is installed, check the Java version using the following command
java –version
Hadoop Environment Setup

 

Step: 2

In this step, we will create Hadoop Group and User. To create a Hadoop group and user, use commands given below.

sudo addgroup bithadoop 
sudo adduser --ingroup viki bithadoop

Now, switch the user using following command.

su viki(Use your machine's user name)

Step: 3

Now, we will configure SSH. For SSH Configuration, use the command given below.

ssh-keygen –t rsa –P “ “

On above command execution, following output can be seen on the screen.

Generating public/private rsa key pair. 
Enter file in which to save the key (/home/viki/.ssh/id_rsa): /home/viki/.ssh/id_rsa
Created directory '/home/viki/.ssh'.
Your identification has been saved in /home/viki/.ssh/id_rsa. Your public key has been saved in /home/viki/.ssh/id_rsa.pub.
The key fingerprint is: 9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 viki@bithadoop. The key's randomart image is, this is the key generated during SSH key generations. This is just some random code produced at the back end. During installation, we are able to see this keys randomart image.

Copying Key to Authorized file

We have to build a client server communication in Hadoop cluster. For this, we have to generate ssh keys and then we have to place those files in authorized files.

For better understanding about SSH, refer below. The Secure Shell (SSH) is a UNIX based command interface and protocol, which is used for securely getting access to a remote computer. Network administrators uses SSH extensively to control Web and servers remotely.

$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys 
$ ssh localhost

Hadoop Environment Copy Keys

Step: 4

Now, we will go for Hadoop Installation. First, we need to download Hadoop Software.Link to download Hadoop. Download the latest version of Hadoop tar file from the archive site and extract the tar file. Navigate to the Hadoop download location.

$ cd /usr/local

Use command given below to extract and install hadoop.

$ sudo tar xzf hadoop-1.0.3.tar.gz 
$ sudo mv hadoop-1.0.3 hadoop
$ sudo chown -R viki:bithadoop hadoop

Modifying .bashrc file in Home directory:

Whenever we run Hadoop daemons, first it will fetch the home path from .bashrc file. So, we have to mention Hadoop home path in this file. Here we are not modifying bashrc, we are adding Hadoop path to bashrc file.

$ sudo nano $HOME/.bashrc 

Set Hadoop-related environment variables. 

Export HADOOP_HOME=/usr/local/hadoop 
# Set JAVA_HOME
Export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
# Add Hadoop bin/ directory to PATH
Export PATH=$PATH:$HADOOP_HOME/bin

Modifying .hadoop-env.sh file in Home directory:

Here, we are adding Hadoop home path in Hadoop-env.sh. This configuration is used for setting up of Hadoop environment

$ sudo nano /usr/local/hadoop/conf/hadoop-env.sh

# Enable the line and edit it as follows: give Java Path here
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386

Make directory for Name node and Data node:

Name node directories will store the Meta data information of files. Data node directories will store the actual data that is residing in HDFS. So, we are mentioning Name node and data Node directory paths in Mapred-site.xml. Before mentioning the nodes in mapred-site.xml, we have to create those directories in the specific location.

$ sudo mkdir -p /app/hadoop/tmp/dfs/name 
$ sudo mkdir -p /app/hadoop/tmp/dfs/data
$ sudo chown – R viki:bithadoop /app/hadoop/tmp
$ sudo chmod –R 755 /app/hadoop/tmp

Editing conf/core-site.xml:-

In this file, we are defining a temporary directory path and Name node space location. By placing properties in core-site.xml, we can get communicate with name node.

<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system</description>
</property>

Editing conf/mapred-site.xml :-

In this file, we are defining the 3 main properties(Job tracker path, Name node and data node directories).

<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/app/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/app/hadoop/tmp/dfs/data</value>
</property>

Editing conf/hdfs-site.xml:-

We will place, Replication factor property in this hdfs-site.xml. This is for setting up the replication of files in Data nodes.

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

Step: 5

Now, we will execute commands to run Hadoop.

$ ./hadoop namenode –format

Hadoop namenode setup

To run all Daemons in hadoop, use commands given below.

$ ./hadoop start-all.sh

OUTPUT:

starting namenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser- namenode-ubuntu.out 
localhost: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop- hduser-datanode-ubuntu.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-secondarynamenode- ubuntu.out
starting jobtracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser- jobtracker-ubuntu.out
localhost: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/ hadoop- hduser-tasktracker-ubuntu.out.
$ jps

Hadoop jps Environment

By using Jps(java Processing Services), we area ble to see what are the java processing services running currently in the Machine. After running command jps, it has to show output like above. Then only we can acknowledge that all installations done properly.