Skip to content


    It is an open source platform massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop.

    Goals of Impala

    • General purpose SQL query engine:
      •Must work both for transactional and analytical workloads
      •Support queries that get from milliseconds to hours timelimit.
    • Runs directly within Hadoop:
      •Reads Hadoop file formats which are broadly used
      •Talks to Hadoop storage managers which are extensively used
      •Runs on same nodes that run Hadoop processes
    • High performance:
      •Runtime code generation
      •Usage of C++ in place of Java
      •Completely new execution engine which is not build on MapReduce