Blog

Kafka overview

Apache Kafka provided fault-tolerant, scalable messaging: Topics Producers Consumers Brokers Topics Kafka maintains feeds of messages in categories called topics. Each topic has a user-defined category… Read More »Kafka overview

Impala

It is an open source platform massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Goals of… Read More »Impala

Sqoop

Sqoop is an automated set of volume data transfer tool which allows to simple import, export of data from structured based data which stores NoSql… Read More »Sqoop

Hadoop Hive

Apache Hive is an open-source data warehouse system that has been built on top of Hadoop. You can use Hive for analyzing and querying large datasets… Read More »Hadoop Hive

Pig in Hadoop

Pig Hadoop is basically a high-level programming language that is helpful for the analysis of huge datasets. Pig Hadoop was developed by Yahoo! and is… Read More »Pig in Hadoop