1 / 37

CS 34701: Large-Scale Networked Systems

CS 34701: Large-Scale Networked Systems. Professor: Ian Foster TA: Adriana Iamnitchi http://dsl.cs.uchicago.edu/Courses/cs347-2002/. CS 34701 Course Goals. Primary Gain deep understanding of fundamental issues that effect design of large-scale networked systems

arendk
Télécharger la présentation

CS 34701: Large-Scale Networked Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 34701: Large-Scale Networked Systems Professor: Ian Foster TA: Adriana Iamnitchi http://dsl.cs.uchicago.edu/Courses/cs347-2002/

  2. CS 34701 Course Goals • Primary • Gain deep understanding of fundamental issues that effect design of large-scale networked systems • Map primary contemporary research themes • Gain experience in network research • Secondary • By studying a set of outstanding papers, build knowledge of how to present research • Learn how to read papers & evaluate ideas

  3. How the Class Works • Research papers • Prior to each class, we all read and evaluate two research papers • During each class, we discuss those papers • Project • One-page project description by 2nd week • Five-page project summary by 5th week • 10-20 final paper by 9th week • Project presentations: 9th and 10th weeks.

  4. Paper Review & Discussion • Everyone reads two papers per class and submits an evaluation (see below) • We discuss (not present) papers in class • A team of 2-3 leads each discussion • The leading team submits discussion plan before class, submits “master critique” and summarizes discussion at the beginning of following class • Look over schedule between now & Friday, when we will allocate discussants

  5. Evaluations • You must submit evaluations of papers • Email them by 6pm the day before • Answer a set of standard questions • State the main contribution of the paper • Critique the main contribution • What are the three strongest and/or most interesting ideas in the paper? • Three most striking weaknesses in the paper? • Three questions to ask the authors? • Detail an interesting extension to the work not mentioned in the future work section. • Optional comments on the paper that you’d like to see discussed in class.

  6. What I’ll Assume You Know • Basic Internet architecture • IP, TCP, DNS, HTTP • Basic principles of distributed computing • Asynchrony (cannot distinguish between communication failures and latency) • Partial global state knowledge (cannot know everything correctly) • Failures happen. In very large systems, even rare failures happen often • If there are things that don’t make sense, ask!

  7. Large-Scale Networked Systems • Internet-connected networks with a large number of components, spanning multiple DNS domains (usually WAN) • Designed to solve specific problems: • Content distribution • Cycle sharing • File sharing • Sensor data fusion • Distributed data analysis • …

  8. Example: Gnutella • Peer-to-peer file sharing system • File sharing: goal is to enable publication and access to files • P2P: no central servers; all clients also act as servers and are equivalent (more or less) • Issues • Scaling to very large numbers of nodes • Properties: bootstrapping, reliability, cost, anonymity, security, freeloading, …

  9. Gnutella Protocol Overview • P2P file sharing application on top of an overlay network: • Nodes maintain open TCP connections. • Messages are broadcasted (flooded) or back-propagated. • Protocol:

  10. A Gnutella search mechanism • Steps: • Node 2 initiates search for file A 7 1 4 2 6 3 5

  11. A A A Gnutella search mechanism • Steps: • Node 2 initiates search for file A • Sends message to all neighbors 7 1 4 2 6 3 5

  12. A A A A Gnutella search mechanism • Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message 7 1 4 2 6 3 5

  13. A A A A:5 A:7 Gnutella search mechanism • Steps: • Node 2 initiates search for A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message 7 1 4 2 6 3 5

  14. A A A:5 A:7 Gnutella search mechanism • Steps: • Node 2 initiates search for A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is back-propagated 7 1 4 2 6 3 5

  15. A:5 A:7 Gnutella search mechanism • Steps: • Node 2 initiates search for A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is back-propagated • Node 2 gets replies 7 1 4 2 6 3 5

  16. Gnutella search mechanism • Steps: • Node 2 initiates search for A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is back-propagated • Node 2 gets replies • File download download A 7 1 4 2 6 3 5

  17. Tools for network exploration • Eavesdropper- modified node inserted into the network to log traffic. • Crawler- connects to all active nodes and uses the membership protocol to discover graph topology. • Parallelcrawling. • Graph analysistools • high-volume offline computations.

  18. Network growth • High user interest: • Users tolerate high latency, low quality results. • Better resources: • DSL and cable modem nodes grew from 24% to 41% over 6 months. • Open architecture / open-source environment: • Competing implementations, • Lower overhead network traffic, improved resource utilization, better structure, • Recently, two-level structure.

  19. Growth invariants Graph connectivity: 3.4 links per node on average. Path length distribution: node-to-node distance maintains similar distributions. • Avg. node-to-node distance grew 25% while the network grew 50 times over 6 months. • Random graph theory predicts about 75% increase.

  20. Is Gnutella a power-law network? Power-law networks: the number of nodes N with exactly L links is proportional to L-kN ~ L-k Examples: • The Internet, • In/out links to/from HTML pages, • Citations network, • US power grid, • Social networks. November 2000 Implication: High tolerance to random node failure but low reliability when facing an ‘intelligent’ adversary

  21. Is Gnutella a power-law network? • Later, larger networks display a bimodal distribution. • Implications: • High tolerance to random node failures preserved • Increased reliability when facing an attack. May 2001

  22. Trafficanalysis •  6-8 kbps per link over any connection. • Traffic structure changed over time.

  23. Total generated traffic 1Gbps (or 330TB/month)! • Note that this estimate excludes actual file transfers • Q: Does it matter? • Compare to 15,000TB/month estimated in US Internet backbone (Dec. 2000). Reasoning: • QUERYandPINGmessages are flooded. They form more than 90% of generated traffic • predominant TTL=7 • >95% of nodes are less than 7 hops away • measured traffic at each link about 6 to 8kbs • network with 50k nodes and 170k links

  24. Topology mismatch The overlay network topology doesn’t match the underlying Internet infrastructure topology! • 40% of all nodes are in the 10 largest Autonomous Systems (AS). • Only 2-4% of all TCP connections link nodes within the same AS. • Largely ‘random wiring’. • Entropy experiment gives similar results.

  25. Course Topics • Internet Architecture and Design Principles • Flat Pricing vs. Prioritized Traffic • Internet Measurements • Availability in Wide-Area • Patterns in Real Networks • Modeling the Internet Topology • Internet Services: DNS • Web Caching, Content Distribution Networks • Overlay Networks • Peer-to-Peer systems • Computational Grids • Security Issues • Sensor Nets • Wireless Networks • XML SOAP and Web Services

  26. Course Topics • Internet Design Principles • How do I deliver Internet services: end-to-end vs. within the network? • Flat Pricing vs. Prioritized Traffic • How do I determine which traffic to pass over the Internet? • Internet Measurements • What does the Internet really look like?

  27. Course Topics • Availability in Wide-Area • How reliable is the Internet? • Patterns in Real Networks • What does Internet traffic look like? • Modeling the Internet Topology • How can I construct realistic models of Internet structure?

  28. Course Topics • Internet Services: DNS • How well does DNS work? • Web Caching, Content Distribution Networks • How do we optimize Web content mgmt? • Overlay Networks • Improving routing performance

  29. Course Topics • Peer-to-Peer systems • Gnutella, etc., etc. • Computational Grids • Globus, etc. • Security Issues • Authorization, etc.

  30. Disaster Response Circulatory Net Course Topics • Sensor Nets • How do I structure & program networks of lightweight devices? • Wireless Networks • How do I route in ad hoc networks? • XML SOAP and Web Services • What are Web services anyway?

  31. Projects • Literature surveys, real implementations, analytical evaluations • Can be performed individually or in a team of two • Your project ideas appreciated (to be discussed before proposal due date) • Primary goal is to do something interesting and to do it well

  32. Example Project • Gnutella network analysis • Develop a “crawler” that traverses network, collects membership & connectivity info • Analyze structure • Characterize structure • See, e.g.: • Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design, M. Ripeanu, I. Foster, A. Iamnitchi, in IEEE Internet Computing Journal, vol. 6(1), 2002

  33. Project Ideas • http://dsl.cs.uchicago.edu/Courses/cs347-2002/cs347_projects.htm • Gnutella network measurements • Topology discovery for 500K nodes • Structural analysis with 500K nodes • Study impact of overlay networks • Etc.

  34. Project Ideas • Overlay networks: build unstructured or semistructured self-organizing overlays optimizing different cost functions: • Topology-aware: map onto physical infrastructure • Usage-aware: map onto usage patterns • Analysis of Sloan Digital Sky Survey logs to explore access patterns • What files are accessed how often • What community usage patterns emerge? • How can we exploit these in content distribution networks?

  35. Project Ideas • Compare qualitatively and analytically current file-location solutions (CAN, Chord, Gnutella, Napster, etc.) in the context of scientific file-sharing collaborations. • Evaluate sharing patterns based on real usage traces in a scientific collaboration • Use these patterns to evaluate benefits/drawbacks and propose better alternatives • Expand existing simulator to evaluate request forwarding techniques for resource location in grid environments

  36. For More Information • Contact me • Ian Foster, foster@cs.uchicago.edu • Email or set up a meeting • Contact Anda, our TA • Adriana Iamnitchi, anda@cs.uchicago.edu • Monitor the class web page • http://dsl.cs.uchicago.edu/Courses/cs347-2002/

  37. Next 2 Classes • Friday: • Discuss: • J. Saltzer, D. Reed, and D. Clark, End-to-end Arguments in System Design. ACM Transactions on Computer Systems, Vol. 2, No. 4, pp. 195-206, 1984. • D. Clark and M. Blumenthal, Rethinking the design of the Internet: The end to end arguments vs. the brave new world, Workshop on Policy Implications of End-to-End. December 1, 2001. • Leading group: Ian + 2 volunteers (who?) • Wednesday: • Leading Group: Anda + 1-2 volunteers (who?)

More Related