1 / 18

Why Spark on Hadoop Matters

Why Spark on Hadoop Matters. MC Srivas , CTO and Founder , MapR Technologies. Apache Spark Summit - July 1, 2014. MapR Overview. Top Ranked. 500+ Customers. Cloud Leaders. Exponential Growth. 3X. 80%. 90%. < 1%. bookings Q1 ‘13 – Q1 ‘14. of accounts expand 3X. software licenses.

perrin
Télécharger la présentation

Why Spark on Hadoop Matters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Why Spark on Hadoop Matters MC Srivas, CTO and Founder, MapR Technologies Apache Spark Summit - July 1, 2014

  2. MapROverview Top Ranked 500+ Customers Cloud Leaders Exponential Growth 3X 80% 90% < 1% bookings Q1 ‘13 – Q1 ‘14 of accounts expand 3X software licenses lifetime churn in incremental revenuegenerated by 1 customer > $1B

  3. Rapidly Evolving Landscape APACHE HADOOP AND OSS ECOSYSTEM SQL Batch NoSQL & Search Streaming Data Integrtn. & Access Security Workflow &Data Gov. Provision ML, Graph Tez* Spark Drill* Management Savannah* Cascading GraphX Shark Accumulo* Hue Storm* Juju Pig MLLib Impala Solr HttpFS Spark Streaming MR v1 & v2 Mahout Hive HBase Flume Knox* Falcon* Whirr ZooKeeper YARN Sqoop Sentry* Oozie EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS MapR Data Platform * 2014 TIMELINE

  4. The Complete Spark Stack on Hadoop APACHE HADOOP AND OSS ECOSYSTEM SQL Batch NoSQL & Search Streaming Data Integrtn. & Access Security Workflow & Data Gov. Provision ML, Graph Tez* Spark Drill* Management Savannah* Cascading GraphX Shark Accumulo* Hue Storm* Juju Pig MLLib Impala Solr HttpFS Spark Streaming MR v1 & v2 Mahout Hive HBase Flume Knox* Falcon* Whirr ZooKeeper YARN Sqoop Sentry* Oozie EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS MapR Data Platform * 2014 TIMELINE

  5. A Winning Combination

  6. Spark Advantages: EASE OF DEVELOPMENT • Easier APIs • Python, Scala, Java IN-MEMORY PERFORMANCE • RDDs • DAGs Unify Processing COMBINE WORKFLOWS • Shark, ML, Streaming, GraphX

  7. Hadoop Advantages: UNLIMITEDSCALE • Multiple data sources • Multiple applications • Multiple users ENTERPRISE PLATFORM • Reliability • Multi-tenancy • Security WIDE RANGE OF APPLICATIONS • Files • Databases • Semi-structured

  8. The Combination of Spark on Hadoop UNLIMITED SCALE EASE OF DEVELOPMENT Operational Applications Augmented by In-Memory Performance IN-MEMORY PERFORMANCE ENTERPRISE PLATFORM COMBINE WORKFLOWS WIDE RANGE OF APPLICATIONS

  9. Case Studies

  10. Industry Leading Ad-Targeting Platform • High performance analytics over MapR M7 NoSQL • Load from M7 table into RDD to augment scoring in real-time • Results fed back to M7 for other applications

  11. Leading Pharma Company: NextGen Genomics Existing process takes several weeks to align chemical compounds with genes ADAM on Spark allows realignment in a few hours Geneticists can minimizeengineering dependency

  12. Cisco: Security Intelligence Operations Sensor data lands in M7 Spark Streaming on M7 for first check on known threats Data next processed on GraphX and Mahout Results queried using SQL via Shark and Impala

  13. Patient information in M7 combined with clinical records to compute re-admittance probability • Process uses Spark with transactional data in M7 • Insurance options decided in real-time on online portals Insurance Giant: Addressing Health Care Regulations

  14. In Summary

  15. Spark on Hadoop gains traction for Real-time applications

  16. Pick the Right Tool for the Job

  17. MapR is Unbiased Open Source (a la Linux) • Open source distribution is about providing choice • Linux includes MySQL, PostgreSQLand SQLite • Linux includes Apache httpd, nginxandLighttpd

  18. Thank you Engage with us! maprtech @mapr mapr-technologies MapR maprtech srivas@mapr.com

More Related