This step by step tutorial on Hadoop Single Node Cluster installation will help you install, run and verify Hadoop Installation in Ubuntu machines. The Prerequisites required for installation are
- Sun Java 6,7 or latest version
- Ubuntu Linux 12.04 (LTS version)
- Hadoop 1.0.3 ( Stable version)
Steps by Step procedure to Install Hadoop
sudo apt-get install openjdk-7-jdk
In this step, we will create Hadoop Group and User. To create a Hadoop group and user, use commands given below.
sudo addgroup bithadoop
sudo adduser --ingroup viki bithadoop
Now, switch the user using following command.
su viki(Use your machine's user name)
Now, we will configure SSH. For SSH Configuration, use the command given below.
ssh-keygen –t rsa –P “ “
On above command execution, following output can be seen on the screen.
Generating public/private rsa key pair.
Enter file in which to save the key (/home/viki/.ssh/id_rsa): /home/viki/.ssh/id_rsa
Created directory '/home/viki/.ssh'.
Your identification has been saved in /home/viki/.ssh/id_rsa. Your public key has been saved in /home/viki/.ssh/id_rsa.pub.
The key fingerprint is: 9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 viki@bithadoop. The key's randomart image is, this is the key generated during SSH key generations. This is just some random code produced at the back end. During installation, we are able to see this keys randomart image.
Copying Key to Authorized file
We have to build a client server communication in Hadoop cluster. For this, we have to generate ssh keys and then we have to place those files in authorized files.
For better understanding about SSH, refer below. The Secure Shell (SSH) is a UNIX based command interface and protocol, which is used for securely getting access to a remote computer. Network administrators uses SSH extensively to control Web and servers remotely.
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
$ ssh localhost
Now, we will go for Hadoop Installation. First, we need to download Hadoop Software.Link to download Hadoop. Download the latest version of Hadoop tar file from the archive site and extract the tar file. Navigate to the Hadoop download location.
$ cd /usr/local
Use command given below to extract and install hadoop.
$ sudo tar xzf hadoop-1.0.3.tar.gz
$ sudo mv hadoop-1.0.3 hadoop
$ sudo chown -R viki:bithadoop hadoop
Modifying .bashrc file in Home directory:
Whenever we run Hadoop daemons, first it will fetch the home path from .bashrc file. So, we have to mention Hadoop home path in this file. Here we are not modifying bashrc, we are adding Hadoop path to bashrc file.
$ sudo nano $HOME/.bashrc
Set Hadoop-related environment variables.
# Set JAVA_HOME
# Add Hadoop bin/ directory to PATH
Modifying .hadoop-env.sh file in Home directory:
Here, we are adding Hadoop home path in Hadoop-env.sh. This configuration is used for setting up of Hadoop environment
$ sudo nano /usr/local/hadoop/conf/hadoop-env.sh
# Enable the line and edit it as follows: give Java Path here
Make directory for Name node and Data node:
Name node directories will store the Meta data information of files. Data node directories will store the actual data that is residing in HDFS. So, we are mentioning Name node and data Node directory paths in Mapred-site.xml. Before mentioning the nodes in mapred-site.xml, we have to create those directories in the specific location.
$ sudo mkdir -p /app/hadoop/tmp/dfs/name
$ sudo mkdir -p /app/hadoop/tmp/dfs/data
$ sudo chown – R viki:bithadoop /app/hadoop/tmp
$ sudo chmod –R 755 /app/hadoop/tmp
In this file, we are defining a temporary directory path and Name node space location. By placing properties in core-site.xml, we can get communicate with name node.
<description>A base for other temporary directories.</description>
<description>The name of the default file system</description>
Editing conf/mapred-site.xml :-
In this file, we are defining the 3 main properties(Job tracker path, Name node and data node directories).
We will place, Replication factor property in this hdfs-site.xml. This is for setting up the replication of files in Data nodes.
Now, we will execute commands to run Hadoop.
$ ./hadoop namenode –format
To run all Daemons in hadoop, use commands given below.
$ ./hadoop start-all.sh
starting namenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser- namenode-ubuntu.out
localhost: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop- hduser-datanode-ubuntu.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-secondarynamenode- ubuntu.out
starting jobtracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser- jobtracker-ubuntu.out
localhost: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/ hadoop- hduser-tasktracker-ubuntu.out.
By using Jps(java Processing Services), we area ble to see what are the java processing services running currently in the Machine. After running command jps, it has to show output like above. Then only we can acknowledge that all installations done properly.