It is an open source platform massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop.
Goals of Impala
- General purpose SQL query engine:
•Must work both for transactional and analytical workloads
•Support queries that get from milliseconds to hours timelimit.
- Runs directly within Hadoop:
•Reads Hadoop file formats which are broadly used
•Talks to Hadoop storage managers which are extensively used
•Runs on same nodes that run Hadoop processes
- High performance:
•Runtime code generation
•Usage of C++ in place of Java
•Completely new execution engine which is not build on MapReduce