400 likes | 526 Vues
CS290F, Winter 2005 Large-scale Networked Systems. Instructor Ben Y. Zhao ( ravenben at cs.ucsb.edu , 1151 Engineering I) Office hour: Thur, 3-4pm Lecture time: TuTh, 1:00-3:00 pm Place: 1401 Phelps Teaching Assistant Ted Huffmire ( huffmire at cs.ucsb.edu ) Office hours: TBA. Overview.
 
                
                E N D
CS290F, Winter 2005Large-scale Networked Systems • Instructor • Ben Y. Zhao (ravenben at cs.ucsb.edu, 1151 Engineering I) • Office hour: Thur, 3-4pm • Lecture time: TuTh, 1:00-3:00 pm • Place: 1401 Phelps • Teaching Assistant • Ted Huffmire (huffmire at cs.ucsb.edu) • Office hours: TBA
Overview • Administrivia • What this class is, and is not • Enrollment, etc… • What you have to do for this class • Grading • Break • Course projects • organization • some initial ideas
Administrivia • Course webpage • http://www.cs.ucsb.edu/~ravenben/classes/290F • check it periodically for announcements • Class mailing list • cs290f at lists.cs.ucsb.edu • go to: http://lists.cs.ucsb.edu/mailman/listinfo/cs290f • sign up today! • Deadlines • Unless otherwise specified, it means 10 minutes before the lecture • Special circumstances should be brought to my attention before deadlines
Class Registration • Please send me (ravenben@cs.ucsb.edu ) an e-mail with the subject “cs290f registration" and the following: • Last and first name • Student ID • Your department • Preferred email address • URL of your home page • Please indicate explicitly if I can add you to the on-line web page that lists each student enrolled in the class (only your name and URL will be made public).
Class Enrollment • Class size capped at 40 • currently enrollment is full • for good discussion, we should be at ~ 25 • For those unable to enroll • email me with your student ID (perm #) • I expect enrollment to open up in first 1-2 weeks
Goals of this Course • Understand • How do peer-to-peer systems work? • What are the design issues in building large networked systems? • Where is networking research heading? • Get familiar with current network research • Understand solutions in context • Goals • Assumptions
Goals of this Course (cont’d) • Appreciate what is good research • Problem selection • Solution and research methodology • Presentation • Apply what you learned in a class project
Reasons to Not Take This Class • It’s not... • a survey class on p2p systems • a reading seminar • I might do a p2p reading group / seminar later • will be focused more on reading papers • Unreasonably high expectations • this is my first grad class • it’s in my research area • And… • smaller class will benefit everyone • also, this class will likely be offered next year
What do you need to do? • A research-oriented class project • Paper reading / reviews • Lead class discussion on one research paper • Participate in discussions in class • this is meant to be an *interactive* class
Research Project • Investigate new ideas and solutions in class project • Define the problem • Execute the research • Work with your partners • Write up and present your research • Ideally, best projects will become conference or workshop papers • e.g., SIGCOMM, MOBICOM, SOSP/OSDI, NSDI • IPTPS, WCW, HotOS, WMCSA, NOSSDAV
Research Project: Steps • I’ll distribute a list of projects • You can either choose one of these projects or come up with your own • Pick your project, partner, and submit a one-page proposal describing: • The problem you are solving • Your plan of attack with milestones and dates • Any special resources you may need • A midterm presentation of your progress (5 minutes) • Poster session • Submit project papers
Paper Reviews • Goal: synthesize main ideas and concepts in papers • Number: 2-3 per class • Length: no more than ½ page per paper • Content • Main points intended by the author • Points you particularly liked/disliked • Other comments (writing, conclusions…) • Submission • Submit each review via email before class on lecture day • See class web page for details
What is a Presentation? • Presentations are in Powerpoint • or PDF if you hate MSFT • Target a 30 minute presentation • that’s about 15 slides • idea is to engage the class in discussion on your points • leave the audience with thoughtful questions • You’ll be graded on several things: • did you present the high-level points of the paper? • did you get a good discussion going • did you relate the paper to others in the course (context!)
Choose Your Paper • Pick your paper from the lecture schedule online • the readings and dates will be “finalized” by Thursday • Email me to “claim” the paper • I’ll let you know if it’s already been taken • presenters names will be updated online asap
About Grading • This is a graduate networking class • key is what you learn, not the grade • Evaluation of projects • 2 components: execution and impact • execution: • proposal, design, implementation, evaluation, presentation • impact: • how novel is your work, how does it advance the state of the art? • good execution  B+/A-, impact  A/A+
Class Topics • Study existing p2p systems: • Gnutella, Freenet, Tapestry, OceanStore, PAST, RON, I3, Vivaldi • Study fundamental system issues: • security, fault-tolerance, system measurement and deployment • Study issues in alternative network environments: • sensor networks • ad-hoc and mobile networks
Break • Take a break, get up, walk around, stretch • come back in 10 minutes
Course Projects • Small project teams • ideally teams of 3 • 2 (likely too small to tackle significant project) • 4 (likely too large to coordinate well) • Choose your team members • use the class enrollment webpage (I’ll put that up shortly) • use the class mailing list • Send me a one-page project proposal by Jan 15: • what is the problem you’re solving (w/ related works) • motivation/challenges (why is this important and new) • plan of attack (w/ milestones and dates) • any resources you might need
Background on Structured Overlays • Each data item and machine (node) in the system has associated a unique ID in a large ID space • ID space is partitioned among nodes • DHT: Hash table like interface • put(id, data) • data = get(id) • data items are stored at the node responsible for its ID • DOLR: directory service interface • publish (id) • routeToId (id, message) • data stored anywhere and located via directory
Structured Peer-to-Peer Overlays • Assign random nodeIDs and keys from secure hash • incrementally route towards destination ID • each node has small set of outgoing routes, e.g. prefix routing ID: ABCE ABC0 To: ABCD AB5F A930
What’s in a Protocol? • Definition of name-proximity • each hop gets you “closer” to destination ID • prefix routing, numerical closeness, hamming distance • Size of routing table • amount of state kept by each node as f (N), N = network size • # of overlay routing hops • worst case routing performance (in overlay hops, not IP) • Network locality • does choice of neighbor consider network distance • impact on “actual” performance of P2P routing • Application Interface
Chord • NodeIDs are numbers on ring • Closeness defined by numerical proximity • Finger table • keep routes for next node 2i away in namespace • routing table size: log2 n • n = total # of nodes • Routing • iterative hops from source • at most log2 n hops Node 0/1024 0 128 896 256 768 640 384 512
Chord II • Pros • simplicity • Cons • limited flexibility in routing • neighbor choices unrelated to network proximity* but can be optimized over time • Application Interface: • distributed hash table (DHash)
Tapestry / Pastry • incremental prefix routing • 11110XXX00XX 000X0000 • routing table • keep nodes matching at least i digits to destination • table size: b * logb n • routing • recursive routing from source • at most logb n hops Node 0/1024 0 128 896 256 768 640 384 512
2175 0157 0154 0123 0880 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 4 4 4 4 4 5 5 5 5 5 7 7 7 7 7 3 3 3 3 3 6 6 6 6 6 Neighbor Map For “2175” (Octal) 0xxx 20xx 210x 2170 1xxx ---- 211x 2171 ---- 22xx 212x 2172 3xxx 23xx 213x 2173 4xxx 24xx 214x 2174 5xxx 25xx 215x ---- 6xxx 26xx 216x 2176 7xxx 27xx ---- 2177 4 3 2 1 Routing Levels Routing in Detail Example: Octal digits, 212 namespace, 2175  0157 5712 0880 3210 4510 7510
Project Ideas • Discuss 10 project suggestions • some related to peer-to-peer protocols and systems • Legend: based on how well-defined projects are not necessary how difficult they are • Well-defined projects (5) • Less-defined project (3) • You need to define project’s goals (2) • You can also work on variants of these, or make up your own topics • Need to send me a one page proposal by Jan 15 • I’ll provide feedback/meetings to iterate on topic
Project 1: Implement Chimera • Implement a structured peer-to-peer protocol that is a hybrid of Tapestry, Pastry, and Bamboo • implement in C/C++ • focus on usability and performance • Key features • simplicity (leafsets like Pastry) • stability (p2p exchange of routing tables like Bamboo) • performance (locality like Tapestry) • support “standard” p2p API and easy to build up on • might support multiple teams
Project 2: the DIY P2P protocol • Study the related literature on existing p2p protocols, and design something different • must have something significantly novel/new • can be new contribution in performance, security, theoretical interest, specialized for a useful application… • The challenge • finding an area yet unexplored in the p2p space • The result • well-specified protocols on routing, node insertion/deletion, failure handling • Warning: this is not easy
Project 3-5: P2P Applications/Systems • All applications built on common API • should work w/ existing protocols and new ones developed in class • Project 3: a lightweight p2p CDN/web-cache • use a DHT/DOLR to quickly search for desired object • use request tracking to determine optimal data placement • implement a client-side http proxy • The challenge • maintaining data stability as nodes come and go • providing fairness / load balancing across nodes
Project 4: P2P EBay • Design and implement a scalable and secure auctioning / e-commerce system on a p2p infrastructure • support large # of transactions/time • support large # of users and items • be stable and resilient against attacks • The challenge • understanding the types of attacks • selecting a reasonable subset to resist • maintaining scalability despite security enhancements
Project 5: Lightweight Data Synchronization • Design and implement Quartz • lightweight p2p data sharing system • store your most critical files (<100MB) online • use simple application-specific handlers to provide fast data synchronization (a la CVS) • synchronize your HTML bookmarks across machines • synchronize your papers, homework files, financial records • end to end encryption • The challenge • keeping data highly-persistent on a dynamic p2p network • optimizing per-node operational overhead • a simple user interface that people will *actually use*
Project 6: P2P Security • Security is one of the biggest challenges in p2p • large population spread across networks and domains • high probability of node compromise • need to function in presence of malicious nodes • Existing work on security discouraging • Sybil attacks hard to prevent • “bad” users can create lots of identities online and generate collusion attacks • hard to avoid malicious nodes • you’re 1, they are many…
Project 6: a P2P reputation system • Design and evaluate a highly-adaptive p2p reputation system • quickly recognize malicious (or compromised) nodes • use p2p collaboration to share reputation information • form “trusted circles” • use “third-party” anonymous verification to build reputations • The challenge • building reliable reputations that are highly adaptable • doing so without generating massive network probe traffic • minimizing impact if system is circumvented
Project 7: Distributed Workload Logging • Modelnet and Emulab provide emulation environments for up to 1000’s of virtual nodes • good for reproducible experiments • bad: in a dedicated cluster, unlike real world conditions • PlanetLab provides real-world platform for distributed applications (~200+ nodes across Inet) • good for deploying / hardening your application/system • bad for reproducible results • What we need • a way to gather interesting (and unexpected) environmental conditions from distributed networks like Planetlab, and use it as trace to drive Modelnet/Emulab
Project 7: cont. • The goal: • design, implement, and deploy a distributed sensor and logging interface • needs to be scalable, lightweight (minimize impact on environment), and accurate • gather a large data set, and build an environmental trace
Project 8: P2P Benchmarking • Design and test and benchmark for P2P routing • needs to account for a wide-range set of application-level behaviors and working conditions • throughput • latency • key lookup versus data read/store • rate of node churn • The challenge • being comprehensive • being fair to goals of different protocols
Project 9: Quantifying P2P Performance • Test a set of protocols against a number of network topologies • mostly simulations (might need to implement some protocols) • test using different types of network models, GT-ITM, BRITE, Waxman, etc… • vary protocol parameters to measure impact on basic performance • The challenge • making sense of large quantities of test data • getting a better understanding of inherent properties of network models and how they impact overlay networks
Project 10: P2P + Ad-hoc • Take p2p routing algorithms, and apply to the ad-hoc or sensor net space • consider new constraints: power, processing, no underlying IP • design new routing protocol • validate via simulation • The challenge • there’s fairly little previous work, and it’s not clear the idea is feasible
Support Infrastructure • CSIL machines • you should all have accounts • A new 32 node networking cluster • just assembled, being installed now • Dell Xeon servers, 2.4Ghz, 2G RAM • cluster building in progress • will have modelNet installed, and support large-scale network emulation tests (~300 nodes)
Summary of TODOs • What you need to do: • send me registration email with subject: cs290f registration • sign up online for class mailing listhttp://lists.cs.ucsb.edu/mailman/listinfo/cs290f • pick a paper to present and email me to claim it(*after* Thur, first come first serve) • find other team members and pick a project topic(first draft of proposal due Jan 15)