1 / 13

Team: #19 Presenter: Xiaozhe Wang Yue Gu

Ricardo: Integrating R and Hadoop. Team: #19 Presenter: Xiaozhe Wang Yue Gu. Agenda. Background Introduction to R Disadvantages for Current Strategies Introduction to Ricardo Overview of Ricardo’s Architecture Evaluation Reference. Background. Data Mining Examples. Eg :

winka
Télécharger la présentation

Team: #19 Presenter: Xiaozhe Wang Yue Gu

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ricardo:Integrating R and Hadoop Team: #19 Presenter: Xiaozhe Wang YueGu

  2. Agenda • Background • Introduction to R • Disadvantages for Current Strategies • Introduction to Ricardo • Overview of Ricardo’s Architecture • Evaluation • Reference

  3. Background DataMiningExamples • Eg: • Amazon personalized recommendation of products • Netfix recommend the movies to the customer by the taste of this customer

  4. Introduction to R R’s functionalityforDataMining • Principal and independent component analysis • k-means clustering • SVM classification • Generalized-linear • Latent-factor • Bayesian • Time- series

  5. Introduction to R R: Simplified Method for Data Mining Kmeans Algorithm Kmeans on R

  6. Disadvantages for Current Strategies in Scalability for Data Mining Disadvantages for Current Strategies • Exploit vertical scalability • Limited • Expensive • Sample the dataset • Lose important features • Lose the accuracy • Large-scale management system(DMS) • Less functionality

  7. Introduction to Ricardo Ricardo: R and Hadoop

  8. Architecture Overview of Ricardo’s Architecture

  9. Evaluation Performance and Scalability • Object:Simulate a real recommender system • Original Netflix competition dataset • Jaqlrequires about twice as much time as raw Hadoop. • higher level of abstraction

  10. Conclusion Conclusion • Ricardo, a scalable platform

  11. Reference S. Das, Y. Sismanis, K. S. Beyer, R. Gemulla, P. J. Haas, andJ. McPherson. Ricardo: integrating R and Hadoop. In SIGMOD2010. http://www.mpi-inf.mpg.de/~rgemulla/publications/das10ricardo.pdf

  12. Questions?

  13. Thanks!!!!

More Related