Using HDFS

Format the configured HDFS file system and then open the namenode (HDFS server) and execute the following command.

$ hadoop namenode -format

Start the distributed file system and follow the command listed below to start the namenode as well as the data nodes in cluster.


Listing Files in HDFS

Finding the list of files in a directory and the status of a file using ‘ls’ command in the terminal. Syntax of ls can be passed to a directory or a filename as an argument which are displayed as follows:

$ $HADOOP_HOME/bin/hadoop fs -ls <args>

Inserting Data into HDFS

Below mentioned steps are followed to insert the required file in the Hadoop file system.

Step1: Create an input directory

$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/input

Step2: Use put command transfer and store the data file from the local systems to the HDFS using the following commands in the terminal.

$ $HADOOP_HOME/bin/hadoop fs -put /home/intellipaat.txt /user/input

Step3: Verify the file using ls command.

$ $HADOOP_HOME/bin/hadoop fs -ls /user/input

Retrieving Data from HDFS

For an instance if you have a file in HDFS called Intellipaat. Then retrieve the required file from the Hadoop file system by carrying out:

Step1: View the data from HDFS using cat command.

$ $HADOOP_HOME/bin/hadoop fs -cat /user/output/intellipaat

Step2: Gets the file from HDFS to the local file system using get command as shown below

$ $HADOOP_HOME/bin/hadoop fs -get /user/output/ /home/hadoop_tp/

Shutting Down the HDFS

Shut down the HDFS files by following the below command


Multi-Node Cluster

Installing Java

Syntax of java version command

$ java -version

Following output is presented.

java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b13)
Java HotSpot(TM) Client VM (build 25.0-b02, mixed mode)

Creating User Account

System user account is used on both master and slave systems for the Hadoop installation.

# useradd hadoop
# passwd hadoop

Mapping the nodes

Hosts files should be edited in /etc/ folder on each and every nodes and IP address of each system followed by their host names must be specified mandatorily.

# vi /etc/hosts

Enter the following lines in the /etc/hosts file. hadoop-master hadoop-slave-1 hadoop-slave-2

Configuring Key Based Login

Ssh should be setup in each node so they can easily converse with one another without any prompt for password.

# su hadoop
$ ssh-keygen -t rsa
$ ssh-copy-id -i ~/.ssh/ [email protected]
$ ssh-copy-id -i ~/.ssh/ [email protected]
$ ssh-copy-id -i ~/.ssh/ [email protected]
$ chmod 0600 ~/.ssh/authorized_keys
$ exit