Marcel Kornacker

The Next Wave of SQL-on-Hadoop: The Hadoop Data Warehouse
Track: Modern Big Data Systems

Location:

Abstract:
Apache Hadoop now increasingly serves as complementary technology for cost-efficient data loading and cleaning to support the enterprise data warehouse (EDW), supporting the EDW’s role in enabling interactive analysis and reporting on relational data. However, thanks to recent advances in the Hadoop ecosystem that expand the range of EDW-equivalent analytic capabilities entirely in open source software, it is now also possible for Hadoop-based enterprise data hubs to serve as an EDW for native Big Data. Thus, costly processes for moving that data into the traditional EDW just for the purpose of analysis are no longer required.

In this session, attendees will hear how one user in the financial services area, which has rolled out Impala to 45 production nodes to date, is using that approach (based on HDFS, Parquet, and Impala) to reduce processing time from hours to seconds and to consolidate unstructured data from different sources such as web applications, non-traditional external data sets, card transactions, and analytical reports to get a single view of all its data.