1 / 24

Apache Hadoop on the Open Cloud

Apache Hadoop on the Open Cloud. David Dobbins Nirmal Ranganathan. Who is using Apache Hadoop. Traditionally = Developers Increasingly = Business Users / Data Scientists Why does this matter?. Configuring and managing a Hadoop cluster is hard. Resources / Expertise.

sanjiv
Télécharger la présentation

Apache Hadoop on the Open Cloud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Apache Hadoopon theOpen Cloud David Dobbins Nirmal Ranganathan

  2. Who is using Apache Hadoop • Traditionally = Developers • Increasingly = Business Users / Data Scientists • Why does this matter?

  3. Configuring and managing a Hadoop cluster is hard

  4. Resources / Expertise

  5. Multiple Performance and Design Variables

  6. The Cloud solves some of these

  7. Advantages of using the cloud Flexible Fast Easy

  8. You still require expertise

  9. Lets check out another option

  10. Hadoop in the Cloud Use Cases

  11. Development / POC Clusters

  12. Dynamic Clusters

  13. Growth Clusters

  14. Your data is already in the Cloud

  15. Demo Run an actual job

  16. Swift Filesystem for Hadoop: HADOOP-8545 The challenges of running Map Reduce jobs against Swift.. • Identity management • Block size • Object store vs file paths • Direct API into swift from HDFS • New filesystem URL, swift:// • Read from, write to local & remote Swift clusters • Keep long-lived data in Swift; upload while Hadoop cluster off-line

  17. Map Reduce to Swift (via “HDFS”) Application X Application X MapReduce MapReduce HDFS HDFS Proxy SWIFT

  18. Hadoop + Openstack

  19. Cloud Big Data Platform • Hortonworks Data Platform • HDP 1.1 • HDP 1.3 • Pig, Hive, HCatalog • Coming soon HDP 2.0

  20. Cloud Big Data Platform • Secure by default • Comes pre-optimized • Web UI, CLI, REST API

  21. Built on Openstack

  22. Why an Open Platform matters Sandbox on Rackspace Cloud RAX Resell Sandbox VM

  23. Cool stuff

  24. @caffiend @rnirmal http://www.rackspace.com/big-data

More Related