210 likes | 331 Vues
This document provides an in-depth exploration of Internet Distributed Computing (IDC), focusing on the design and implementation of a simple distributed computing framework named Grapevine. The study compares the development of applications in this environment with traditional parallel computing. It addresses crucial issues such as node reliability, network quality, scalability, and security. Additionally, it discusses the construction of meta-classifiers using machine learning approaches and highlights the potential efficiency gains through parallel processing in distributed systems like Grapevine.
E N D
Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu
Project Overview Design and implement a simple internet distributed computing framework Compare application development for this environment with traditional parallel computing environment.
Grapevine An Internet Distributed Computing Framework - Kunal Agrawal, Kevin Chu
Motivation • Supercomputers are very expensive • Large numbers of personal computers and workstations around the world are naturally networked via the internet • Huge amounts of computational resources are wasted because many computers spend most of their time idle • Growing interest in grid computing technologies
Internet Distributed Computing Issues • Nodes reliability • Network quality • Scalability • Security • Cross platform portability of object code • Computing Paradigm Shift
Client Application Grapevine Server Grapevine Volunteer Grapevine Volunteer Grapevine Volunteer
Grapevine Features • Written in Java • Parametrized Tasks • Inter-task communication • Result Reporting • Status Reporting
Un-addressed Issues • Node reliability • Load Balancing • Un-intrusive Operation • Interruption Semantics • Deadlock
Meta Classifier - Ang Huey Ting, Li Guoliang
Classifier • Function(instance) = {True,False} • Machine Learning Approach • Build a model on the training set • Use the model to classify new instance • Publicly available packages : WEKA(in java), MLC++.
Meta Classifier • Assembly of classifiers • Gives better performance • Two ways of generating assembly of classifiers • Different training data sets • Different algorithms • Voting
Building Meta Classifier • Different Train Datasets - Bagging • Randomly generated ‘bags’ • Selection with replacement • Create different ‘flavors’ of the training set • Different Algorithms • E.g. Naïve Bayesian, Neural Net, SVM • Different algorithms works well on different training sets
Why Parallelise? • Computationally intensive One classifier = 0.5 hr Meta classifier (assembly of 10 classifiers) = 10 *0.5 = 5 hr • Distributed Environment - Grapevine • Build classifiers in parallel independently • Little communication required
Distributed Meta Classifiers • WEKA- machine learning package • University of Waikato, New Zealand • http://www.cs.waikato.ac.nz/~ml/weka/ • Implemented in Java • Including most popular machine learning tools
Distributed Meta-Classifiers on Grapevine Distributed Bagging • Generate different Bags • Define bag and Algorithm for each task • Submit tasks to Grapevine • Node build Classifiers • Receive results • Perform voting
Preliminary Study • Bagging on Quick Propagation in openMP • Implemented in C
Trial Domain • Benchmark corpus Reuters21578 for Text Categorization • 9000+ train documents • 3000+ test documents • 90+ categories • Perform feature selection • Preprocess documents into feature vectors
Summary • Successful internet distributed computing requires addressing many issues outside of traditional computer science • Distributed computing is not for everyone