40 likes | 168 Vues
This project focuses on the design, build, and operation of a large-scale distributed application, addressing crucial issues such as scalability, reliability, efficient resource use, and operational ease. Working in groups of up to three students, the project involves utilizing the PlanetLab platform to deploy a large-scale crawler for data collection, with the master-worker design and performance considerations like non-blocking I/O. Students will engage in various phases, from familiarization with PlanetLab to the development of a monitoring service and executing large-scale crawls, all while being provided with necessary libraries and support.
E N D
Project Design, build, and operate a large-scale distributed application. • Issues to worry about: Scalability, reliability, efficient use of resources, easy to operate, reuse, • Large-scale deployment platform. (PlanetLab) • Limited handholding • Groups of up to 3 students. • TO DO: Start thinking about your group.
Gnutella Network Topo crawl Topo information (e.g., neighboring nodes) Main idea: recursively crawl the entire network Support provided: libraries, bootstrap node
Project steps • P1. warmup: familiarize yourself with PlanetLab, setup the environment, develop a monitoring service • P2. start crawling in controlled environment: centralized / single node crawler. • P3. large –scalecrawler • Master-worker design • Deployed on planetlab (and using at least 100 nodes) • Options: • Single node vs. distributed. • Blocking vs. non-blocking IO; • Volume of data gathered
Alternatives: Your own project Goal: Design, build, and operate a large-scale distributed application Some ideas • Crawl and analyze data form other p2p or social networks: • e.g., Twitter, Skype, YouTube • Hard: closed protocols (Skype) • Cool: no (or few) independent measurements • Explore Amazon service performance: e.g., S3 • Performance: latency, throughput, consistency • multiple vantage points (migration?) • Hard: limited budget (!), black box • Cool: real, well engineered service, huge scale • Others: • location services, • Network health monitoring