1 / 4

Designing and Operating a Large-Scale Distributed Application on PlanetLab

This project focuses on the design, build, and operation of a large-scale distributed application, addressing crucial issues such as scalability, reliability, efficient resource use, and operational ease. Working in groups of up to three students, the project involves utilizing the PlanetLab platform to deploy a large-scale crawler for data collection, with the master-worker design and performance considerations like non-blocking I/O. Students will engage in various phases, from familiarization with PlanetLab to the development of a monitoring service and executing large-scale crawls, all while being provided with necessary libraries and support.

jamil
Télécharger la présentation

Designing and Operating a Large-Scale Distributed Application on PlanetLab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Project Design, build, and operate a large-scale distributed application. • Issues to worry about: Scalability, reliability, efficient use of resources, easy to operate, reuse, • Large-scale deployment platform. (PlanetLab) • Limited handholding • Groups of up to 3 students. • TO DO: Start thinking about your group.

  2. Gnutella Network Topo crawl Topo information (e.g., neighboring nodes) Main idea: recursively crawl the entire network Support provided: libraries, bootstrap node

  3. Project steps • P1. warmup: familiarize yourself with PlanetLab, setup the environment, develop a monitoring service • P2. start crawling in controlled environment: centralized / single node crawler. • P3. large –scalecrawler • Master-worker design • Deployed on planetlab (and using at least 100 nodes) • Options: • Single node vs. distributed. • Blocking vs. non-blocking IO; • Volume of data gathered

  4. Alternatives: Your own project Goal: Design, build, and operate a large-scale distributed application Some ideas • Crawl and analyze data form other p2p or social networks: • e.g., Twitter, Skype, YouTube • Hard: closed protocols (Skype) • Cool: no (or few) independent measurements • Explore Amazon service performance: e.g., S3 • Performance: latency, throughput, consistency • multiple vantage points (migration?) • Hard: limited budget (!), black box • Cool: real, well engineered service, huge scale • Others: • location services, • Network health monitoring

More Related