Using HDFS

Format the configured HDFS file system and then open the namenode (HDFS server) and execute the following command.

$ hadoop namenode -format

Start the distributed file system and follow the command listed below to start the namenode as well as the data nodes in cluster.

$ start-dfs.sh

Listing Files in HDFS

Finding the list of files in a directory and the status of a file using ‘ls’ command in the terminal. Syntax of ls can be passed to a directory or a filename as an argument which are displayed as follows:

$ $HADOOP_HOME/bin/hadoop fs -ls <args>

Inserting Data into HDFS

Below mentioned steps are followed to insert the required file in the Hadoop file system.

Step1: Create an input directory

$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/input

Step2: Use put command transfer and store the data file from the local systems to the HDFS using the following commands in the terminal.

$ $HADOOP_HOME/bin/hadoop fs -put /home/intellipaat.txt /user/input

Step3: Verify the file using ls command.

$ $HADOOP_HOME/bin/hadoop fs -ls /user/input

Retrieving Data from HDFS

For an instance if you have a file in HDFS called Intellipaat. Then retrieve the required file from the Hadoop file system by carrying out:

Step1: View the data from HDFS using cat command.

$ $HADOOP_HOME/bin/hadoop fs -cat /user/output/intellipaat

Step2: Gets the file from HDFS to the local file system using get command as shown below

$ $HADOOP_HOME/bin/hadoop fs -get /user/output/ /home/hadoop_tp/

Shutting Down the HDFS

Shut down the HDFS files by following the below command

$ stop-dfs.sh

Multi-Node Cluster

Installing Java

Syntax of java version command

$ java -version

Following output is presented.

java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b13)
Java HotSpot(TM) Client VM (build 25.0-b02, mixed mode)

Creating User Account

System user account is used on both master and slave systems for the Hadoop installation.

# useradd hadoop
# passwd hadoop

Mapping the nodes

Hosts files should be edited in /etc/ folder on each and every nodes and IP address of each system followed by their host names must be specified mandatorily.

# vi /etc/hosts

Enter the following lines in the /etc/hosts file.

192.168.1.109 hadoop-master
192.168.1.145 hadoop-slave-1
192.168.56.1 hadoop-slave-2

Configuring Key Based Login

Ssh should be setup in each node so they can easily converse with one another without any prompt for password.

# su hadoop
$ ssh-keygen -t rsa
$ ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
$ ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
$ ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
$ chmod 0600 ~/.ssh/authorized_keys
$ exit