1 / 7

An introduction to Apache Spark MLlib

A introduction to Apache Spark MLlib, what is it and how does it work ? What can it do ?

semtechs
Télécharger la présentation

An introduction to Apache Spark MLlib

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Apache Spark MLlib • What is Apache Spark ? • What is MLlib ? • Functionality • Dependencies • Books • Eco-system www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  2. Spark – What is it ? • Alternative to Map Reduce for certain applications • A low latency cluster computing system • For very large data sets • May be 100 times faster than Map Reduce • Used with Hadoop / HDFS • Uses in memory cluster computing • Memory access faster than disk access • Has API's written in Scala / Java / Python www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  3. Spark MLlib – What is it ? • Spark Machine Learning Library • Provided with Spark Install • Code in Scala / Java / Python • Contain libraries • Spark.mllib • Spark.ml ( V1.2 ) • Provides common functionality • classification, regression, clustering • collaborative filtering, dimensionality reduction www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  4. Spark MLlib – Functionality • Basic Stats • Classification and regression • Collaborative Filtering • Clustering • Dimensionality reduction • Feature extraction and transformation • Optimization www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  5. Spark MLlib – Dependencies • NumPy for Python • Breeze ( linear algebra ) • Netlib-java • Jblas • Gfortran runtime library www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  6. Spark Eco system www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  7. Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems

More Related