370 likes | 503 Vues
In this lecture, we'll explore Peer-to-Peer (P2P) systems, their architecture, and the TCP/IP protocol suite. Topics include the distinctions between client-server models and P2P, examining the advantages of P2P systems in terms of scalability, efficiency, and fault tolerance. We will categorize P2P systems into centralized, unstructured, structured, and hybrid types, providing examples like Napster and Gnutella. We will also discuss distributed hash tables (DHT) and their implementations, addressing the significance of node interconnectivity and fault tolerance in dynamic environments.
E N D
Instructor: Zhijun Wang Quiz#2 will be given in the last hour Lab#3 will be given on the next week Today’s content Overview of Peer-to-Peer (P2P) system Search in P2P P2P streaming COMP 416b Internet Protocols and Software TCP/IP Protocol Suite
Client-server architecture The traditional web service is a client-server model. Clients request data from the server. Reliable, powerful, successful. WWW, FTP TCP/IP Protocol Suite
Limitations of client-server model • Scalability: poor scalability for huge systems • Failure recovery: a single point failure problem • Maintenance: server needs administration TCP/IP Protocol Suite
Peer-to-peer (P2P) architecture • Each peer acts as both client and server. Any node can provide and consume data • No centralized data source TCP/IP Protocol Suite
Features of P2P system • Clients are also servers and routers • Nodes contribute content storage, memory, CPU • Nodes are autonomous, no administrative authority • System is dynamic: peers can frequently join and leave • Peers collaborate directly with each other • Peers have widely varying capability TCP/IP Protocol Suite
Advantages of P2P system • Efficient use of resources • Scalability • Consumers of resources also denote resources • Reliability • No single point of failure • Replicas and geographic distribution • Ease of administration • Peers self organized • No server is needed TCP/IP Protocol Suite
Types of P2P system • Centralized • Napster • Non-Centralized • Unstructured • Gnutella • Structured • FreeNet, Chord, CAN, Tapestry, Pastry • Hybrid • Kazaa TCP/IP Protocol Suite
Centralized P2P system • Napster model: File index is stored in a center server for file search File exchange directly by peers • Benefits: • Low per-node state • Limited bandwidth usage • Short location time • High success rate • Fault tolerant • Drawbacks: • Central point of failure • Limited scale • Possibly unbalanced load Bob Alice Judy Jane TCP/IP Protocol Suite
Napster • A P2P system for music sharing • Peers upload their list of files to Napster server • Peers send queries to Napster server for files on intrest • Napster server replies with IP address of users with matching files • Peers direct connect to the peers storing the intrested files. TCP/IP Protocol Suite
Unstructured P2P system • Gnutella model • Benefits: • Limited per-node state • Fault tolerant • Drawbacks: • High bandwidth usage • Long time to locate item • No guarantee on success rate • Possibly unbalanced load Carl Jane Bob Alice Judy TCP/IP Protocol Suite
Gnutella: • Decentralized search Peers ask their neighbors for interested files Neighbors ask their neighbors and so on. Peers reply the query if the file is matched TCP/IP Protocol Suite
Structured P2P system • FreeNet, Chord, CAN, Tapestry, Pastry model Files are placed based on the file ID. Using hash for file search. • Benefits: • Manageable per-node state • Manageable bandwidth usage and time to locate item • Guaranteed success • Drawbacks: • Possibly unbalanced load • Harder to support fault tolerance 001 012 212 ? 212 ? 332 212 305 TCP/IP Protocol Suite
Distributed hash table (DHT) • Distributed version of a hash table data structure • Stores (key, value) pairs • The key is like a filename • The value can be file contents • Goal: Efficiently insert/lookup/delete (key, value) pairs • Each peer stores a subset of (key, value) pairs in the system • Core operation: Find node responsible for a key • Map key to node • Efficiently route insert/lookup/delete request to this node TCP/IP Protocol Suite
DH routing protocol • DHT is a generic interface • There are several implementations of this interface • Chord [MIT] • Pastry [Microsoft Research UK, Rice University] • Tapestry [UC Berkeley] • Content Addressable Network (CAN) [UC Berkeley] • SkipNet [Microsoft Research US, Univ. of Washington] • Kademlia [New York University] • Viceroy [Israel, UC Berkeley] • P-Grid [EPFL Switzerlan] TCP/IP Protocol Suite
Document Routing 001 012 • FreeNet, Chord, CAN, Tapestry, Pastry model • Benefits: • More efficient searching • Limited per-node state • Drawbacks: • Limited fault-tolerance vs redundancy 212 ? 212 ? 332 212 305 TCP/IP Protocol Suite
Associate to each node and item a unique id in an d-dimensional space Goals Scales to hundreds of thousands of nodes Handles rapid arrival and failure of nodes Properties Routing table size O(d) Guarantees that a file is found in at most d*n1/d steps, where n is the total number of nodes Document Routing – CAN TCP/IP Protocol Suite Slide modified from another presentation
Simple failures know your neighbor’s neighbors when a node fails, one of its neighbors takes over its zone More complex failure modes simultaneous failure of multiple adjacent nodes scoped flooding to discover neighbors hopefully, a rare event Node Failure Recovery TCP/IP Protocol Suite Slide modified from another presentation
Search in unstructured P2P system TTL: Time-to-live TCP/IP Protocol Suite
Flooding TCP/IP Protocol Suite
Random walking A peer randomly chooses a neighbor to query the interested file. The neighbor randomly forward the query to its neighbor and so on. Less overhead messages for file search. Long search time. TCP/IP Protocol Suite
Expanding ring First set TTL=1, send the query to the neighbors. If no file is found. Set TTL=2, send the query again. And if no file is found, increase TTL by 1, until the file is found or TTL reaches the maximum value. Has cost and delay tradeoff F K M B E J G A L C D I H TCP/IP Protocol Suite
Instructor: Zhijun Wang P2P streaming COMP 416b Internet Protocols and Software TCP/IP Protocol Suite
BitTorrent (BT) • Released in the summer of 2001 by Bram Cohen. • P2P application that capitalizes the resources (disk storage & bandwidth) for effective large files distribution. • Unlike Kazaa/Gnutella which thrive to quickly locate a file, BitTorrent’s objective is to quickly replicate a file to clients. • Uses basic ideas from game theory to largely eliminate the free-rider problem. • Working extremely well. TCP/IP Protocol Suite
Basic concepts of BT • Seed : A peer that has the entire file. • Leecher : A peer that has incomplete copy of the file. • A .torrent file. • Passive component • Lists a SHA1 hash to verify integrity of files • Typically hosted on a web server • A Tracker • Active component • Allows peers to find each other • Returns a random list of peers connected TCP/IP Protocol Suite
Swarming Pieces and Sub-pieces • A piece, typically 256KB is broken into 16KB sub-pieces. • Until a piece is assembled, only sub-pieces for that piece is downloaded. • This ensures that complete pieces assemble quickly. • When transferring data over TCP, it is critical to always have several requests pending at once, to avoid a delay between pieces being sent. • At any point in time, some number, typically 5, are requested simultaneously. • On piece completion, notify all peers. TCP/IP Protocol Suite
Pieces selection • The order of pieces is very important for good performance. • A bad algorithm could result in all peers waiting for the same missing piece. • Random Piece First policy • Initially a peer had no pieces to trade, thus important to get a piece ASAP. • Policy: Peer starts with a random piece to download. • Rarest Piece First policy • Policy: Download the pieces which are most rare among your peers. • Ensures most common pieces are left for last. TCP/IP Protocol Suite
HAVE <12,7,36> HAVE <14> HAVE <12,7,14> Rarest First Policy Peer 12,7,36 12,7,14 Peer . . . Peer 14 TCP/IP Protocol Suite
Topology P2P streaming system • Tree topology Server Peers are arranged as a tree. A peer download the data from its parent. TCP/IP Protocol Suite
Topology P2P streaming system • Mesh topology Server A peer can download data from its neighbors. No single point failure. TCP/IP Protocol Suite
IPTV system • PPLive: free P2P-based IPTV • As of January 2006, the PPLive network provided 200+ channels with 400,000 daily users on average. • The bit rates of video programs mainly range from 250 Kbps to 400 Kbps with a few channels as high as 800 Kbps. • The video content is mostly feeds from TV channels in Mandarin. • The channels are encoded in two video formats: Window Media Video (WMV) or Real Video (RMVB). • The encoded video content is divided into chunks and distributed to users through the PPLive P2P network. TCP/IP Protocol Suite
IPTV Architecture TCP/IP Protocol Suite
PPlive process TCP/IP Protocol Suite
PPlive process • Cached contents can be uploaded to other peers watching the same channel. • This peer may also upload cached video chunks to multiple peers. • Received video chunks are reassembled in order and buffered in queue of PPLive TV engine, forming local streaming file in memory. • When the streaming file length crosses a predefined threshold, the PPLive TV engine launches media player, which downloads video content from local HTTP streaming server. • After the buffer of the media player fills up to required level, the actual video playback starts. • When PPLive starts, the PPLive TV engine downloads media content from peers aggressively to minimize playback start-up delay. • When the media player receives enough content and starts to play the media, streaming process gradually stabilizes. • The PPLive TV engine streams data to the media player at media playback rate. TCP/IP Protocol Suite
PPlive measurement • One residential and one campus PC “watched” channel CCTV3 • The other residential and campus PC “watched” channel CCTV10 • Each of these four traces lasted about 2 hours. • From the PPLive web site, CCTV3 is a popular channel with a 5-star popularity grade and CCTV10 is less popular with a 3-star popularity grade. TCP/IP Protocol Suite
Session duration • Signaling versus video sessions • All sessions are TCP based • The median video session is about 20 seconds and about 10% of video sessions last for over 15 minutes or more. TCP/IP Protocol Suite
Video traffic breakdown TCP/IP Protocol Suite
Questions?? TCP/IP Protocol Suite