1 / 38

Using Data from a Hadoop Cluster in Oracle Database

Using Data from a Hadoop Cluster in Oracle Database. Melli Annamalai. If you need to reset your VM. source /home/oracle/movie/ moviework /reset/reset_conn.sh.

devlin
Télécharger la présentation

Using Data from a Hadoop Cluster in Oracle Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Data from a Hadoop Cluster in Oracle Database MelliAnnamalai

  2. If you need to reset your VM • source/home/oracle/movie/moviework/reset/reset_conn.sh

  3. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

  4. Oracle Big Data Connectors Licensed Together • Oracle SQL Connector for HDFS • Oracle Loader for Hadoop • Oracle R Connector for Hadoop • Oracle Data Integrator Application Adapters for Hadoop • Announced at OOW 2013: Oracle Xquery for Hadoop

  5. Program Agenda • Oracle SQL Connector for HDFS • Brief Overview • Hands-on Exercises • Oracle Loader for Hadoop • Brief Overview • Hands-on Exercises • (Optional exercise): Use both connectors together

  6. Oracle Big Data Connectors Connecting Hadoop to Oracle Database Oracle Database Oracle Advanced Analytics Oracle Big Data Connectors Oracle Spatial & Graph Hadoop Oracle Database Acquire – Organize – Analyze Oracle DataIntegrator

  7. Loading and Accessing Data from Hadoop Oracle SQL Connector for HDFSOracle Loader for Hadoop Input1 MAP Shuffle/Sort MAP Shuffle/Sort MAP Reduce Reduce MAP Reduce MAP ReducE MAP Reduce MAP MAP LOG FILES MAP MAP Reduce Shuffle/Sort MAP Reduce MAP MAP Reduce MAP Input2 Oracle Database

  8. Oracle SQL Connector for HDFS

  9. Oracle SQL Connector for HDFS Access or load into the database in parallel using external table mechanism Hadoop Oracle Database Use Oracle SQL to Load or Access Data on HDFS Load into the database using SQL Option to access and analyze data in place on HDFS Access Hive (internal and external) tables and HDFS filesAutomatic load balancing to maximize performance SQL Query OSCH External Table OSCH OSCH OSCH HDFS Client

  10. Installing Oracle SQL Connector for HDFS Hive Client OSCH Hadoop Client OSCH External Table Oracle Database System Hadoop Cluster

  11. Part 1a: Reading Hive Tables with Oracle SQL Connector for HDFS Hadoop Oracle Database • OSCH Command Line Utility automatically generates external table and locator files • Reads hive metadata to generate columns • Uses XML configuration files to drive process Hive table movieapp_log_stage movie_fact_ext_tab_hive d2.loc <uri_list> <uri_list_item size="2527870“ compressionCodec=""> hdfs://localhost.localdomain:8020/user/oracle/data1</uri_list_item> <uri_list_item size=“101000“ compressionCodec=""> hdfs://localhost.localdomain:8020/user/oracle/data2</uri_list_item> </uri_list> File System data2.loc data1.loc data3.loc Locator Files d1.loc d2.loc

  12. Data Files on Hadoop

  13. Data files on Hadoop • prompt> hadoopfs –ls /user/oracle/moviework/data • prompt> hadoopfs –cat /user/oracle/moviework/data/part-00002

  14. Part 1a

  15. Part 1a: Reading Hive Tables withOracle SQL Connector for HDFS • cd /home/oracle/movie/moviework/osch • This directory contains the scripts genloc_moviefact_hive.sh, moviefact_hive.xml • Run the OSCH command-line utility to create external table • sh genloc_moviefact_hive.sh • (the password is: welcome1)

  16. Part 1a • Contents of genloc_moviefact_hive.sh hadoop jar $OSCH_HOME/jlib/orahdfs.jar \ oracle.hadoop.exttab.ExternalTable \ -conf /home/oracle/movie/moviework/osch/moviefact_hive.xml \ -createTable

  17. Part 1 moviefact_hive.xml • Examine the Hadoop configuration properties • more moviefact_hive.xml

  18. Part 1a • Query the table • sqlplusmoviework/oracle SQL> select count(*) from movie_fact_ext_tab_hive; SQL> select custid from movie_fact_ext_tab_hive where rownum < 10; SQL> select custid, title from movie_fact_ext_tab_hive p, movie q where p.movieid = q.movieid and rownum < 10;

  19. Part 1b

  20. Part 1b: Reading HDFS Files with Oracle SQL Connector for HDFS Hadoop Oracle Database • OSCH Command Line Utility automatically generates external table and locator files • All external table columns have type VARCHAR2 • Uses XML configurationfiles to drive process movie_fact_ext_tab_hive data2.loc d2.loc <uri_list> <uri_list_item size="2527870“ compressionCodec=""> hdfs://localhost.localdomain:8020/user/oracle/data1</uri_list_item> <uri_list_item size=“101000“ compressionCodec=""> hdfs://localhost.localdomain:8020/user/oracle/data2</uri_list_item> </uri_list> data1.loc File System data3.loc Locator Files d1.loc d2.loc

  21. Part 1b: Reading HDFS Files withOracle SQL Connector for HDFS • cd /home/oracle/movie/moviework/osch • This directory contains the scripts genloc_moviefact_text.sh, moviefact_text.xml • Run the OSCH command line utility to create the external table • sh genloc_moviefact_text.sh • (the password is: welcome1)

  22. Part 1b • The script genloc_moviefact_text.sh hadoop jar $OSCH_HOME/jlib/orahdfs.jar \ oracle.hadoop.exttab.ExternalTable \ -conf /home/oracle/movie/moviework/osch/moviefact_text.xml \ -createTable

  23. Part 1b moviefact_text.xml • Examine the Hadoop configuration properties • more moviefact_file.xml

  24. Part 2Oracle Loader for Hadoop

  25. Oracle Loader for Hadoop Connect to the database from reducer nodes, load into database partitions in parallel Partition, sort, and convert into Oracle data types on Hadoop Oracle Loader for Hadoop Shuffle/Sort Features Offloads data pre-processing from the database server to Hadoop Works with a range of input data formats Automatic balancing in case of skew in input data Online and offline modes MAP Reduce MAP Reduce MAP MAP Shuffle/Sort Reduce MAP Reduce MAP Reduce

  26. Installing Oracle Loader for Hadoop Hive Client OLH Target Table Oracle Database System Hadoop Cluster

  27. Part 2: Loading Data with Oracle Loader for Hadoop Hadoop Oracle Database • Load in parallel into database table using OCI direct path • Loader map file maps input data to columns in database table • Uses XML configurationfiles to drive process data2.loc data1.loc movie_sessions_tab data3.loc

  28. Part 2: Loading Data with Oracle Loader for Hadoop • Examine the data files on HDFS • hadoopfs -ls /user/oracle/moviedemo/session

  29. Part 2: Oracle Loader for Hadoop • cd /home/oracle/movie/moviework/olh • This directory contains all the necessary scripts moviesession.sql, moviesession.xml, loaderMap_moviesession.xml, runolh_session.sh

  30. Part 2 • Create the table data will be loaded into • sqlplusmoviedemo/welcome1 SQL> @moviesession.sql

  31. Part 2 • Submit the Oracle Loader for HadoopMapReduce job • sh runolh_session.sh hadoop jar ${OLH_HOME}/jlib/oraloader.jar oracle.hadoop.loader.OraLoader-conf home/oracle/movie/moviework/olh/moviesession.xml

  32. moviesession.xml Part 2 • Examine the Hadoop configuration properties • more moviesession.xml

  33. loaderMap_moviesession.xml Part 2 • Examine the loaderMap file • more loaderMap_moviesession.xml

  34. Versions • Certified Versions • Oracle Database 10.2.0.5, 11.2.0.2, 12c and higher • Hadoop distributions • CDH3, CDH4 (versions of Cloudera’s Distribution including Apache Hadoop) • Apache Hadoop 1.0.x, 1.1.1 • Should work with Hadoopdistros based on certified Apache Hadoop versions

  35. Oracle Loader for Hadoop and Oracle SQL Connector for HDFS Oracle Data Pump files in HDFS queried (and loaded if necessary) with Oracle SQL Connector of HDFS. Offline load: Data pre-processed and written as Oracle Data Pump format in HDFS. Oracle SQL connector for hdfs Oracle Loader for Hadoop Shuffle/Sort MAP Reduce MAP SQL Query Reduce MAP External Table HDFS Client OSCH MAP Shuffle/Sort Reduce ODCH ODCH MAP Reduce Oracle Database MAP Reduce

More Related