It is an open source platform massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop.

Goals of Impala

  • General purpose SQL query engine:
    •Must work both for transactional and analytical workloads
    •Support queries that get from milliseconds to hours timelimit.
  • Runs directly within Hadoop:
    •Reads Hadoop file formats which are broadly used
    •Talks to Hadoop storage managers which are extensively used
    •Runs on same nodes that run Hadoop processes
  • High performance:
    •Runtime code generation
    •Usage of C++ in place of Java
    •Completely new execution engine which is not build on MapReduce