Big Data

Impala

It is an open source platform massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Goals of… Read More »Impala

Sqoop

Sqoop is an automated set of volume data transfer tool which allows to simple import, export of data from structured based data which stores NoSql… Read More »Sqoop

Hadoop Hive

Apache Hive is an open-source data warehouse system that has been built on top of Hadoop. You can use Hive for analyzing and querying large datasets… Read More »Hadoop Hive

Pig in Hadoop

Pig Hadoop is basically a high-level programming language that is helpful for the analysis of huge datasets. Pig Hadoop was developed by Yahoo! and is… Read More »Pig in Hadoop

Installing Hadoop

Hadoop should be downloaded in the master server using the following procedure. # mkdir /opt/hadoop # cd /opt/hadoop/ # wget http://apache.mesi.com.ar/hadoop/common/hadoop-1.2.1/hadoop-1.2.0.tar.gz # tar -xzf hadoop-1.2.0.tar.gz… Read More »Installing Hadoop

Using HDFS

Format the configured HDFS file system and then open the namenode (HDFS server) and execute the following command. $ hadoop namenode -format Start the distributed… Read More »Using HDFS