1 / 39

Oracle’s Big Data solutions

Oracle’s Big Data solutions. Jean-Philippe Breysse Oracle Suisse.

sue
Télécharger la présentation

Oracle’s Big Data solutions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Oracle’s Big Data solutions Jean-Philippe Breysse Oracle Suisse

  2. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.The development, release, and timing of any features or functionality described for Oracle’s products remain at the sole discretion of Oracle.

  3. USE CASE 3: LOGS ANALYSIS OF SERVERS • Short Description : • Daily logs analysis • Issues: • Find correlations on what drives to failures • Log files stored as flat files

  4. Oracle Technology mapped to Analytics Landscape Oracle 12g Oracle 12g Oracle R Enterprise & Oracle Data Mining Master & Reference Oracle Data Integrator Structured Oracle Times Ten Files Transactions Oracle BI Enterprise Oracle Golden Gate Semi-structured Endeca MDEX Machine Generated Oracle NoSQL Oracle Real Time Decisions Oracle Essbase Oracle HadoopMapReduce Oracle Endeca Information Discovery Text, Image, Video, Audio Oracle Hadoop HDFS Unstructured Organize Decide Analyze Data Acquire

  5. Agenda • Big Data • Solution Spectrum • Inside the Big Data Appliance • Big Data Applications Software • Big Data Analytics • Conclusions

  6. <Insert Picture Here> Big Data Why Everyone Should Care

  7. Tapping into Diverse Data Sets Video and Images Big Data: Decisions based on all your data Documents Social Data Machine-Generated Data Information Architectures Today: Decisions based on database data Transactions

  8. A bit of history ... : Developed initially by Doug Cutting (Nutch - Opensourcewebsearch engine) and Yahoo -> inspired by Google’s papers on MapReduce and GFS (2003-2004) resulted in Apache Hadoop (2006) Amazon Dynamo (2007): distributed systems technologies Cassandra: was developed at Facebook (2008) to power their Inbox Search feature (columnar oriented distributed DB) based initially on Dynamo and Bigtable (built by Google) Voldemort: is a distributed data store that is designed as a key-value store used by LinkedIn for high-scalability storage (NoSql key value) Cloudera: . It contributes to Hadoop and related Apache projects and provides a commercial distribution of Hadoop

  9. So What is Big Data Anyway? • It’s a matter of perspective. Big Data is both: • LARGE AND VARIABLE DATASETSthat are difficult for traditional database tools to easily manage – including datasets that once seemed not important or too problematic to deal with. Big Data datasets include: • Extremely large files of unstructured or semi-structured data • Large and highly distributed datasets that are otherwise difficult to manage as a single unit of information • NEW SET OF TECHNOLOGIES that can economically capture, store, manage, and extract value from Big Data datasets – thus facilitating better, more informed business decisions Structured Data vs. Unstructured Data Relational databases work best with structured data– data which has underlying structure (schema) and size that easily fits the specific confines of database columns and rows. Unstructured datais highly variable, lacks fixed structure, and is often too large to easily handle by RDBMS systems. Source: IDC Digital Universe Study, Extracting Value from Chaos, June 2011 (sponsored by EMC)

  10. <Insert Picture Here> Drive Value from Big Data Building a Big Data Platform

  11. Divided Solution Spectrum DataVariety Unstructured DistributedFile Systems NoSQL FlexibleSpecializedDeveloperCentric MapReduceSolutions Schema-less Transaction (Key-Value)Stores DBMS (DW) DBMS (OLTP) Advanced Analytics ETL SQL TrustedSecureAdministered Schema Organize Acquire Analyze

  12. Hadoop to Oracle – Bridging the Gap DataVariety Unstructured HDFS HadoopMapReduce Cassandra Schema-less Oracle Loader for Hadoop RDBMS(DW) RDBMS(OLTP) Advanced Analytics ETL Schema Organize Acquire Analyze

  13. Oracle Integrated Software Solution DataVariety Unstructured HDFS Hadoop OracleAnalytics: Data MiningRSpatialGraphmapreduce Oracle (DW) Schema-less Oracle Loader for Hadoop Oracle NoSQL DB OracleData Integrator Oracle (OLTP) Schema OBI EE Organize Acquire Analyze

  14. <Insert Picture Here> Inside the Big Data Appliance Overview

  15. Big Data Appliance • Hadoop • NoSQL Database • Oracle Loader for hadoop • Oracle Data Integrator Oracle Engineered Solutions DataVariety In-DB Analytics “R” Mining Text Graph Spatial Oracle Database (DW) Unstructured HDFS Hadoop • Exalytics • Speed ofThoughtAnalytics Oracle Loader for Hadoop Oracle Database (OLTP) Oracle NoSQL DB • Oracle Exadata • OLTP & DW • Data Mining & Oracle R • Semantics • Spatial OracleData Integrator Oracle BI EE Schema InformationDensity Organize Acquire Analyze

  16. Big Data ApplianceUsage Model Oracle Big Data Appliance Oracle Exadata Oracle Exalytics InfiniBand InfiniBand Acquire Organize Analyze & Visualize Stream

  17. Oracle Big Data Appliance Hardware • 18 Sun X4270 M2 Servers • 48 GB memory per node = 864 GB memory • 12 Intel cores per node = 216 cores • 24 TB storage per node = 432 TB storage • 40 Gb p/sec InfiniBand • 10 Gb p/sec Ethernet

  18. Scale Out to Infinity Scale out by connecting racksto each other using Infiniband • Expand up to eight racks without additional switches • Scale beyond eight racks by adding an additional switch

  19. Oracle Big Data Appliance Software • Oracle Enterprise Linux 5.6 • Oracle Hotspot Java VM • Cloudera’s Distributionincluding Apache Hadoop • Cloudera Manager • Open Source Distribution of R • Oracle NoSQL DatabaseCommunity Edition

  20. <Insert Picture Here> Big Data Application Software Acquire New Information

  21. Key-Value Store Workloads • Large dynamic schema based data repositories • Data capture • Web applications (click-through capture) • Online retail • Sensor/statistics/network capture (factory automation for example) • Backup services for mobile devices • Data services • Scalable authentication • Real-time communication (MMS, SMS, routing) • Personalization • Social Networks

  22. Storage Nodes Data Center B Storage Nodes Data Center A Oracle NoSQL DB A distributed, scalable key-value database • Simple Data Model • Key-value pair with major+sub-key paradigm • Read/insert/update/delete operations • Scalability • Dynamic data partitioning and distribution • Optimized data access via intelligent driver • High availability • One or more replicas • Disaster recovery through location of replicas • Resilient to partition master failures • No single point of failure • Transparent load balancing • Reads from master or replicas • Driver is network topology & latency aware • Elastic (Planned for Release 2) • Online addition/removal of Storage Nodes • Automatic data redistribution Application Application NoSQLDB Driver NoSQLDB Driver

  23. <Insert Picture Here> Big Data Application Software Organizing Data for Analysis

  24. Oracle Loader for Hadoop Features • Load data into a partitioned or non-partitioned table • Single level, composite or interval partitioned table • Support for scalar datatypes of Oracle Database • Load into Oracle Database 11g Release 2 • Runs as a Hadoop job and supports standard options • Pre-partitions and sorts data on Hadoop • Online and offline load modes

  25. Oracle Loader for Hadoop Input1 MAP MAP Shuffle/Sort MAP Shuffle/Sort MAP Oracle Loader for Hadoop MAP Reduce Reduce MAP Shuffle/Sort Reduce MAP Reduce MAP Reduce MAP Reduce Reduce MAP MAP Reduce Reduce MAP MAP Reduce Shuffle/Sort MAP Shuffle/Sort Reduce MAP MAP Reduce MAP MAP MAP Reduce MAP Input2

  26. Oracle Loader for Hadoop: Online Option Read target table metadata from the database Connect to the database from reducer nodes, load into database partitions in parallel Perform partitioning, sorting, and data conversion Oracle Loader for Hadoop MAP MAP Shuffle/Sort MAP Reduce Reduce Shuffle/Sort MAP Reduce MAP Reduce MAP Reduce

  27. Oracle Loader for Hadoop: Offline Option Read target table metadata from the database Perform partitioning, sorting, and data conversion Write from reducer nodes to Oracle Data Pump files Oracle Loader for Hadoop MAP MAP Shuffle/Sort Import into the database in parallel using external table mechanism MAP Reduce Reduce Shuffle/Sort MAP Reduce MAP Reduce MAP Reduce

  28. Selection Output Option for Use Case

  29. Automate Usage of Oracle Loader for Hadoop • ODI has knowledge modules to • Generate data transformation code to run on Hive/Hadoop • Invoke Oracle Loader for Hadoop • Use the drag-and-drop interface in ODI to • Include invocation of Oracle Loader for Hadoop in any ODI packaged flow • Oracle Data Integrator (ODI)

  30. <Insert Picture Here> Big Data Analytics Real Time Analytics Platform

  31. R Statistical Programming Language Open source language and environment Used for statistical computing and graphics Strength in easily producing publication-quality plots Highly extensible with open source community R packages

  32. <Insert Picture Here> Drive Value from Big Data Conclusions

  33. Big Data ApplianceBig Data for the Enterprise • Optimized and Complete • Everything you need to store and integrate your lower information density data • Integrated with Oracle Exadata • Analyze all your data • Easy to Deploy • Risk Free, Quick Installation and Setup • Single Vendor Support • Full Oracle support for the entire system and software set

  34. Oracle Integrated Solution Stack for Big Data Hadoop(MapReduce) HDFS Oracle Analytic Applications Data Warehouse Oracle NoSQL Database Oracle Loaderfor Hadoop In-Database Analytics Enterprise Applications Oracle DataIntegrator ACQUIRE ORGANIZE ANALYZE DECIDE

  35. Oracle: Big Data for the Enterprise • The most comprehensive solution • Includes everything needed to acquire, organize and analyze all your data • Optimized for Extreme Analytics • Deepest analytics portfolio with access to all data • Engineered to Work Together • Eliminate deployment risk and support risk • Enterprise Ready • Deliver extreme performance and scalability

  36. Questions

More Related