1 / 10

An introduction to Databricks

A introduction to Databricks, what is it and how does it work ? What can it do ?

semtechs
Télécharger la présentation

An introduction to Databricks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Databricks • What is Databricks ? • Cloud services used • Functionality • Languages • Spark Usage • 3rd Party Apps • Architecture • Books www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  2. Databricks – What is it ? • A Cloud based Apache Spark cluster service • Offers scalable Spark clusters based on AWS • Developed by the same people who created Spark • Multiple cluster management • Job scheduling and library import • Offers access to all Spark modules www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  3. Databricks – Cloud Services • Currently uses Amazon AWS • Uses EC2 and has access to S3 buckets • Uses a minimum of 2 EC2 instances • Attempts to optimise EC2 usage • Plans to extend to other cloud providers www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  4. Databricks – Functionality • Architecture based on Notebooks and folders • Has a cluster manager for • Defined (min 54gb) clusters • Spot clusters • On Demand clusters • Has a job manager and scheduler • Has user management • Has full Spark functionality • Has strong data visualisation capability • Can export reports and dashboards www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  5. Databricks – Languages • Can have Notebooks in • Scala • Python • SQL • SQL can be executed in non SQL Notebooks • Markdown comments can be placed in Notebooks • Notebooks can be shared by multiple sessions • Libraries can be imported and called in Notebooks www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  6. Databricks – Spark Usage • Lastest Spark version available • i.e. DB 1.3.4 uses Spark 1.3.1 at June 2015 • All Spark modules available • SQL, GraphX, MlLib, Streaming • Strong integration between modules and visualisation • Extensive use of tables to import data • Tables available via SQL www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  7. Databricks – 3rd Party Apps • Current available and more to come • Pentaho • Qlik • Tableau • TIBC Jaspersoft • PanTera • ZoomData www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  8. Databricks – Architecture www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  9. Available Books • See our Hadoop book from Apress / Springer • “Big Data Made Easy” • Look out for our Apache Spark based book • from Packt in 2015 www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  10. Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems

More Related