90 likes | 218 Vues
This presentation by Jason Dai, Principal Engineer at Intel SSG, explores the necessary enhancements for SQL support in Hive, specifically focusing on SQL-92 compliance and advanced capabilities for analytics. Key topics include complete SQL data type systems, full subquery support, advanced OLAP functions, and procedural extensions. The challenges of adapting the Hive parser to support complex SQL constructs and the development efforts through Project Panthera are addressed. This work aims to provide robust analytical capabilities on Hadoop and HBase, fostering rich data analysis and reporting.
E N D
SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel SSG (Software and Services Group)
What SQL support is needed? • More SQL-92 support for analytics • Complete SQL data type system • Data types (e.g., Datetime, fixed precision numbers), type conversion rules & function (CAST), Datetime expressions and functions (e.g. extract, +/- interval), etc. • Full subquery support • Subqueryin WHERE clauses, correlated subquery, scalar subquery, etc. • New expressions (EXISTS, ALL, ANY, etc.) • Complete Set operators • DISTINCT UNION, INTERSECT, EXCEPT, etc. • Multiple-table SELECT statement • Update/delete? • On HBase only? • (Almost) SQL-92 compliance? • How about transaction?
What SQL support is needed (continued)? • Additional analytics support (beyond SQL-92) • Advanced OLAP functions for analysis & reporting • E.g., rank, rollup, cube, window function(SQL 2003), etc. • Advanced SQL syntax • E.g. WITH clause (SQL-99) • Procedural extensions • E.g., Begin, End, If…Then...Else, Loop/Exit/Continue, etc.
Let’s Get Our Hands Dirty AST (Abstract Syntax Tree) • (Almost) SQL-compliant Hive parser • A lot of work: SQL much more complex than HiveQL • HiveQL grammar file: ~61KB with 2487 lines • SQL (with PL/SQL extensions) grammar file: ~524KB with 8583 lines • Also complex: many existing Hive grammar rules need to be changed • To support more complex SQL constructs (e.g., subquery) • UDF/UDAF/UDTF • For some operators (e.g., rank) Parser Execution Semantic Analyzer (Optimizer) Execution Plan Query
Let’s Get Our Hands Dirty AST (Abstract Syntax Tree) • Analysis, transformation & optimization • SQL data type system • Subquery support (incl. subqueryunnestting) • Multiple-table SELECT • Set operations • Advanced OLAP functions • … Parser Execution Semantic Analyzer (Optimizer) Execution Plan Query
How to Leverage Existing Works? • Project Panthera: • Our open source efforts to enable better analytics capabilities on Hadoop/HBase • https://github.com/intel-hadoop/project-panthera SQL-AST Analyzer & Translator Hive Semantic Analyzer SubqueryUnnesting Multi-Table SELECT (Open Source) SQL Parser* Hadoop MR SQL-AST Hive-AST MINUS Support INTERSECT Support SQL Driver … Query *https://github.com/porcelli/plsql-parser … • A SQL engine for Hive MapReduce • Goal: full analytical SQL support for OLAP • Subquery in WHERE clause • Correlated subquery • Multiple-table SELECT statement • … HiveQL Hive Parser Hive-AST
How to Leverage Existing Works? • NextR Hive UDFs • https://github.com/nexr/hive-udf • UDFs for Oracle dbextensions (rank, decode, nvl, etc.) • SQL windowing functions for Hive • https://github.com/hbutani/SQLWindowing