Skip to content

Hadoop NoSQL

    Relational Database (RDBMS) is a technology used on a large scale in commercial systems, banking, flight reservations, or applications using data structured. SQL (Structured Query Language) is the query language oriented to these applications.

    Database applications stand out in the consistency of data schemas. We can scale it, but not use it as infinite scaling.

    The need to analyze data in large volumes, from different sources and formats, has given rise to NoSQL (Not Only SQL) technology. They are not relational and not based on schemas (rules governing data or objects). All NoSQL implementations are looking for the scaled handling of large volumes of unstructured data.

    NoSQL databases can grow and focus more on performance, allowing replication of data across multiple network nodes, reading, writing, and processing data at incredible speed, using distributed parallel processing paradigms. We can use NoSQL in real-time data analysis, such as personalization of sites from user behavior tracking, IoT (Internet of Things) such as vehicle telematics or mobile device telemetry.

    NoSQL Types

    The three main types of NoSQL are.

    • Column Database (column-oriented)
    • Key-Value Database (key/value oriented)
    • Document Database (document-oriented)

    1. Column Database

    A NoSQL database that stores data in tables and manages them by columns instead of rows. Called as the columnar database management system (CDBMS).

    It converts columns into data files.

    One benefit is the fact that it can compress data, allowing operations such as the minimum, maximum, sum, counting, and averages. They can be auto-indexed, using less disk space than a relational database system including the same data.

    Apache HBase-Is a NoSQL-oriented Columns. Developed to run on top of Hadoop with HDFS.

    Designed from the concepts of the original columnar database and developed by Google, called “BigTable.” It is excellent for real-time research, reading and accessing large volumes of data.

    1. Key-Value Database

    A key/value oriented NoSQL stores data in collections of key/value pairs. For example, a student id number may be the key, and the student’s name may be the value.

    It is a dictionary, storing a value, such as an integer, and a string (JSON or Matrix file structure), along with the key to reference that value.

    Apache Cassandra-Cassandra is a powerful NoSQL based key/value model. Facebook developed it in 2008, is scalable and fault tolerant.

    Developed to solve Big Data analytical problems in real time involving Petabytes of data using MapReduce. Cassandra can run without Hadoop, but it becomes powerful when connected to Hadoop and HDFS.

    • Document Database (document-oriented)

    Document-oriented NoSQL are like key/value documents.

    NoSQL organizes documents into collections analogous to relational tables. We can research based on values, not just key-based ones.

    MongoDB-It is a document-oriented NoSQL, developed by MongoDB Inc., and distributed free by the Apache Foundation.

    MongoDB stores JSON document data as if it were a schema, meaning fields may differ from one document to another, and the data structure may change.

    We can execute it without Hadoop, but it becomes powerful when connected to Hadoop and HDFS.

    Let’s look into HBase, Cassandra and MongoDB in more detail