150 likes | 367 Vues
Dive into the world of Apache Spark, a powerful low-latency cluster computing system designed for large data sets, outperforming traditional MapReduce by up to 100 times. Discover MLlib, Spark's Machine Learning library, which includes essential functionality such as classification, regression, clustering, and more. Learn about its dependencies, eco-system, and available resources in the form of books. Whether you’re using Scala, Java, or Python, Spark's APIs offer unmatched performance, utilizing in-memory computing for quicker access. For a full presentation, visit PowerShow.com.
E N D
Apache Spark MLlib • What is Apache Spark ? • What is MLlib ? • Functionality • Dependencies • Books • Eco-system www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Spark – What is it ? • Alternative to Map Reduce for certain applications • A low latency cluster computing system • For very large data sets • May be 100 times faster than Map Reduce • Used with Hadoop / HDFS • Uses in memory cluster computing • Memory access faster than disk access • Has API's written in Scala / Java / Python www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Spark MLlib – What is it ? • Spark Machine Learning Library • Provided with Spark Install • Code in Scala / Java / Python • Contain libraries • Spark.mllib • Spark.ml ( V1.2 ) • Provides common functionality • classification, regression, clustering • collaborative filtering, dimensionality reduction www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Interesting, right? This is just a sneak preview of the full presentation. We hope you like it! To see the rest of it, just click here to view it in full on PowerShow.com. Then, if you’d like, you can also log in to PowerShow.com to download the entire presentation for free.