Format the configured HDFS file system and then open the namenode (HDFS server) and execute the following command.
$ hadoop namenode -format
Start the distributed file system and follow the command listed below to start the namenode as well as the data nodes in cluster.
$ start-dfs.sh
Listing Files in HDFS
Finding the list of files in a directory and the status of a file using ‘ls’ command in the terminal. Syntax of ls can be passed to a directory or a filename as an argument which are displayed as follows:
$ $HADOOP_HOME/bin/hadoop fs -ls <args>
Inserting Data into HDFS
Below mentioned steps are followed to insert the required file in the Hadoop file system.
Step1: Create an input directory
$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/input
Step2: Use put command transfer and store the data file from the local systems to the HDFS using the following commands in the terminal.
$ $HADOOP_HOME/bin/hadoop fs -put /home/intellipaat.txt /user/input
Step3: Verify the file using ls command.
$ $HADOOP_HOME/bin/hadoop fs -ls /user/input
Retrieving Data from HDFS
For an instance if you have a file in HDFS called Intellipaat. Then retrieve the required file from the Hadoop file system by carrying out:
Step1: View the data from HDFS using cat command.
$ $HADOOP_HOME/bin/hadoop fs -cat /user/output/intellipaat
Step2: Gets the file from HDFS to the local file system using get command as shown below
$ $HADOOP_HOME/bin/hadoop fs -get /user/output/ /home/hadoop_tp/
Shutting Down the HDFS
Shut down the HDFS files by following the below command
$ stop-dfs.sh
Multi-Node Cluster
Installing Java
Syntax of java version command
$ java -version
Following output is presented.
java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b13)
Java HotSpot(TM) Client VM (build 25.0-b02, mixed mode)
Creating User Account
System user account is used on both master and slave systems for the Hadoop installation.
# useradd hadoop
# passwd hadoop
Mapping the nodes
Hosts files should be edited in /etc/ folder on each and every nodes and IP address of each system followed by their host names must be specified mandatorily.
# vi /etc/hosts
Enter the following lines in the /etc/hosts file.
192.168.1.109 hadoop-master
192.168.1.145 hadoop-slave-1
192.168.56.1 hadoop-slave-2
Configuring Key Based Login
Ssh should be setup in each node so they can easily converse with one another without any prompt for password.
# su hadoop
$ ssh-keygen -t rsa
$ ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
$ ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
$ ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
$ chmod 0600 ~/.ssh/authorized_keys
$ exit