290 likes | 400 Vues
This article delves into the scalability challenges faced by traditional client/server systems and explores the potential of peer-to-peer (P2P) architectures, particularly BitTorrent, as a viable solution. It discusses the issues of server workload increases with client numbers, flash crowd problems, and existing approaches like CDNs. It highlights how BitTorrent's unstructured P2P system allows users to share workloads effectively, emphasizing features like efficient data transfer protocols, piece selection strategies, and fairness mechanisms within the network.
E N D
Incentives Build Robustnessin BitTorrent Bram Cohen
Motivation Single server • Scalability issues for client/server systems • Server’s workload grows linearly with number of clients • Flash crowd problem
An “Ideal” Solution: IP Multicast Single server • Same stream is shared by all clients receiving same data • Requires infrastructure-level changes • Security issues • No widely accepted transport protocol on IP multicast layer
Current Approaches • Server farms replace central server • Scalability problem continues • Expensive to maintain • Content delivery networks (CDNs) • Akamai, … • Still expensive • Peer-to-peer systems (P2P) • Newer approach
P2P Systems (I) • Let clients, now called peers, share the server workload • Peers forward all the data they receive to other peers
Advantages • P2P solutions are • Scalable: • Downloading bandwidth grows with number of peers • Easy to deploy: • No additional hardware • No change to network infrastructure • Cheap
Issues • Organizing data transfers: • Figuring which peers have which chunks of data • Deciding where to send these chunks • Dealing with churning: • Peers come and go • Enforcing fairness: • Some peers do not upload as many data as they download
BitTorrent • Unstructured P2P System • Peers have no parent peers or child peers • Centralized tracker • Collects information on peers • Responds to requests for that information • Built-in fairness incentive • Rechoking favors cooperative peers • Simple user interface
Deployment (I) • Decision to use BitTorrent is made by publisher of file • Users join BitTorrent to get a file they want • Most users stops uploading once they have downloaded the file • Standard implementation keeps uploading until the BT window closes
Deployment (II) • In a typical deployment • Number of downloaders having parts of the file (leeches) increases very fast then peaks at a maximum before decreasing exponentially • Number of downloaders having the whole file (seeds) increases more slowly then peaks at a maximum before decreasing exponentially
Starting a BT (I) • To start a BT, publisher puts on a web server a static file with information about • The file • Its length • Its name, • Hashing information • The URL of a tracker
The tracker • Helps downloaders find each other • Uses a simple protocol layered on top of HTTP • New downloader sends information about • What file it’s downloading • What port it’s listening on • … • Tracker replies with a list of peers that are downloading the same file
Starting a BT (II) • Next step is starting a downloader having the whole contents of the file • The seed • Web server and tracker havevery low bandwidth requirements • Seed must upload at least once whole file contents
Connecting with peers • Standard tracker algorithm returns random lists of peers • Random graphs are very robust • BitTorrent cuts files into pieces of fixed size, typically a quarter megabyte • Each downloader • Reports to its peers what pieces it has • Starts exchanging these pieces with them
Ensuring data integrity • Torrent file on web server has SHA1 hashes of all the pieces • Peers don’t report that they have a piece until they’ve checked its hash • Could have used erasure codes
Pipelining • Must avoid delays between pieces being sent • BT breaks pieces further into sub-pieces (typically 16KB in size) • Always keeps several requests pipelined at once • Sends a new request each time a sub-piece arrives
Piece Selection (I) • Downloaders requesting pieces follow four policies • Strict Priority • Finish first downloading pieces of which downloader has one or more sub-pieces • Gets complete pieces as quickly as possible
Piece Selection (II) • Rarest First • Download first the pieces that the fewest of their own peers have • Ensures that peers have the pieces that most of their peers want
Piece Selection (III) • Random First Piece • New peer should get its first complete piece as quickly as possible • Rare pieces can be downloaded from fewer peers than other pieces • New peer will select first pieces to download at random until it has obtained a complete piece
Piece Selection (IV) • Endgame Mode • At end of download • Peer will send to all other peers requests for sub-pieces it doesn’t have yet from all other • Will send cancels for all sub-pieces which arrive • Objective is to speed up end of download
Overview Torrent file Tracker Two nodes and theirpeers Downloaders A seed
BT Choking Algorithm • Penalizes peers that do not reciprocate • Tit-for-tat policy • Every ten seconds, each peer selects four less cooperating peers it will choke • Will refuse to upload data to these peers for ten seconds • Long enough for TCP to reach full capacity with the new transfers
Optimistic Unchoking • At all times, each BT peer has single‘optimistic unchoke’, • Unchoked regardless of its current upload • Rotated every third rechoke period (30 seconds) • “Correspond very strongly to always cooperating on the first move in prisoner’s dilemma.”
Upload only mode • A peer that has the whole content of the file it wanted to download starts privileging peers that use best its upload bandwidth
Anti-snubbing • A peer might be sometimes choked by all peers from which it was downloading • “To mitigate this problem, when over a minute goes by without getting a single piece from a particular peer, BitTorrent assumes it is ’snubbed’ by that peer and doesn’t upload to it except as an optimistic unchoke”
Actual deployments • BT routinely serves • Files hundreds of megabytes in size • To hundreds of simultaneous downloaders • Can have over a thousand concurrent downloaders. • Sole scaling bottleneck appears to be the bandwidth overhead of the tracker • One thousandth of total traffic