1 / 21

Explorations into Internet Distributed Computing

Explorations into Internet Distributed Computing. Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu. Project Overview. Design and implement a simple internet distributed computing framework

andren
Télécharger la présentation

Explorations into Internet Distributed Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

  2. Project Overview Design and implement a simple internet distributed computing framework Compare application development for this environment with traditional parallel computing environment.

  3. Grapevine An Internet Distributed Computing Framework - Kunal Agrawal, Kevin Chu

  4. What is Internet Distributed Computing?

  5. Motivation • Supercomputers are very expensive • Large numbers of personal computers and workstations around the world are naturally networked via the internet • Huge amounts of computational resources are wasted because many computers spend most of their time idle • Growing interest in grid computing technologies

  6. Other Distributed Computing Efforts

  7. Internet Distributed Computing Issues • Nodes reliability • Network quality • Scalability • Security • Cross platform portability of object code • Computing Paradigm Shift

  8. Overview Of Grapevine

  9. Client Application Grapevine Server Grapevine Volunteer Grapevine Volunteer Grapevine Volunteer

  10. Grapevine Features • Written in Java • Parametrized Tasks • Inter-task communication • Result Reporting • Status Reporting

  11. Un-addressed Issues • Node reliability • Load Balancing • Un-intrusive Operation • Interruption Semantics • Deadlock

  12. Meta Classifier - Ang Huey Ting, Li Guoliang

  13. Classifier • Function(instance) = {True,False} • Machine Learning Approach • Build a model on the training set • Use the model to classify new instance • Publicly available packages : WEKA(in java), MLC++.

  14. Meta Classifier • Assembly of classifiers • Gives better performance • Two ways of generating assembly of classifiers • Different training data sets • Different algorithms • Voting

  15. Building Meta Classifier • Different Train Datasets - Bagging • Randomly generated ‘bags’ • Selection with replacement • Create different ‘flavors’ of the training set • Different Algorithms • E.g. Naïve Bayesian, Neural Net, SVM • Different algorithms works well on different training sets

  16. Why Parallelise? • Computationally intensive One classifier = 0.5 hr Meta classifier (assembly of 10 classifiers) = 10 *0.5 = 5 hr • Distributed Environment - Grapevine • Build classifiers in parallel independently • Little communication required

  17. Distributed Meta Classifiers • WEKA- machine learning package • University of Waikato, New Zealand • http://www.cs.waikato.ac.nz/~ml/weka/ • Implemented in Java • Including most popular machine learning tools

  18. Distributed Meta-Classifiers on Grapevine Distributed Bagging • Generate different Bags • Define bag and Algorithm for each task • Submit tasks to Grapevine • Node build Classifiers • Receive results • Perform voting

  19. Preliminary Study • Bagging on Quick Propagation in openMP • Implemented in C

  20. Trial Domain • Benchmark corpus Reuters21578 for Text Categorization • 9000+ train documents • 3000+ test documents • 90+ categories • Perform feature selection • Preprocess documents into feature vectors

  21. Summary • Successful internet distributed computing requires addressing many issues outside of traditional computer science • Distributed computing is not for everyone

More Related