Impala in Action Querying and Mining Big Data

Ricky Saltzer, Istvan Szegedi, Paul De Schacht

Paperback (07 Apr 2015)

Not available for sale

Out of stock

Publisher's Synopsis

Hadoop queries in Pig or Hive can be too slow for real-time data analysis. Impala, an ultra-speedy query engine from Cloudera, supercharges Hadoop by avoiding the typical Map-Reduce overhead and parallelizing queries so that they can run on multiple nodes. This is a big deal for big data, because with Impala, querying Hadoop takes seconds rather than minutes. Impala's dialect is close to standard SQL, and Impala seamlessly accesses HBase and HDFS (Hadoop Distributed File System), allowing considerable freedom in choice of data formats.

Impala in Action is a hands-on guide to querying Hadoop using Impala. It starts by comparing Impala to traditional databases and database services on Hadoop. Then it explains Impala's SQL dialect and the basics of data access. Next, it tackles data visualization tasks and provides techniques for securing Impala with Apache Sentry. The book also shows how to embed Impala queries in a Java client and how to connect to JDBC and ODBC clients. Advanced readers will appreciate the deep dive into Impala's architecture and the practical insights into the issues complicated configurations and complex queries can cause.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

ISBN:	9781617291982
Publisher:	Manning Publications
Imprint:	Manning Publications
Pub date:	07 Apr 2015
DEWEY:	005.7565
DEWEY edition:	23
Language:	English
Number of pages:	250
Weight:	381g
Height:	235mm
Width:	187mm
Spine width:	22mm