Advanced Topics in Computer Systems (ACS, R01)

Advanced Topics in Computer Systems (ACS, R01) 6th October 2011 Steven Hand

Welcome and Introductions • Welcome to Cambridge, the ACS and R01! • First, everyone should introduce themselves • Aim for 1-2 minutes • Things to cover: • Your name, and educational background; • Your general areas of research interest; and • Your biggest or best or most interesting project (hw or sw) or other technical achievement. • Add other topics if you wish!

What is Systems Research? • Typically systems research is: • Practical (motivated by real-world issues) • Low-level (little maths or formalism) • Pragmatic (good enough is good enough) • Real (systems people build stuff) • Pretty broad area: operating systems, file-systems, databases, distributed systems, language runtimes, system security…

ACM SOSP 2009 • Premier systems conference is SOSP; runs every two years (next is in 3 weeks time...) • The 2009 technical program had 23 papers… • “FAWN: Fast Array of Wimpy Nodes” • Basic idea: investigate trade-off between power consumption versus computing power • Build a cluster of nodes with 500Mhz AMD Geode CPU, 256MB RAM, 4GB flash & 100Mb/s ethernet • Show “better” (in terms of #ops per Joule) than similar system with conventional (quad core) nodes.

ACM SOSP 2009 • “RouteBricks: Exploiting Parallelism to Scale Routers” • Investigates if commodity servers can be used to build a high-performance router • Demonstrate 35GB/s throughput with four off-the-shelf (well Nehalem) servers • “Multikernel: A new OS Architecture for Scalable Multicore Systems” • Argue that shared-by-default is the wrong way to build OSes for emerging multicore hardware • Instead suggest separate kernel per-core, and explicit message passing (RPC) to co-ordinate • Build a prototype - “Barrelfish” – and demonstrate that it scales well, and performs comparably with Linux

ACM SOSP 2009 • “Fast Byte-Granularity Software Fault Isolation” • Describes a new scheme to protect the OS kernel from buggy or malicious kernel modules (drivers) • Compiler back-end to infer types for driver code, plus run-time system to enforce isolation • Not a totally new idea: this is about how to make it fast! • “Automatically Patching Errors in Deployed Software” • Monitors Windows binaries, and “learns” invariants (i.e. normal behavior) • If get a crash, work out which invariant(s) were violated • Automatically generate a patch which avoids this (e.g. input validation, new error handling path, …)

ACM SOSP 2009 • “Better I/O Through Byte-Addressable Persistent Memory” • Considers how to build a file-system if you have PCM (flash++) directly on the memory bus • Key insight is you can do updates in place, i.e. no extra copies in kernel memory, no cascades • Show 2x perf compared to using a RAMdisk • “Operating System Transactions” • Add transactional behavior to system calls (to get e.g. ‘automatic’ error recovery, better security (TOCTTOU)) • Build ‘TxOS’ (linux with tx support for 50% of syscalls) • Show it works, and adds approx 10% overhead

ACM SOSP 2009 • “sel4: Formal Verification of an OS Kernel” • Formally verify a version of the L4 microkernel • Abstract spec in Isabelle/HOL, “executable spec” in Haskell, and real version in C and assembly • Show exec ‘refines’ abstract, and real refines exec • (Incredibly impressive work!) • “UpRight Cluster Services” • Argue that BFT is cool/important, but too complex • Build a library which ‘automatically’ allows you to add BFT functionality to your code, and show it works

Phew! • That’s just a selection of 9 papers – others include robust sensor networks, schedulers for distributed computing, using machine learning to thwart identity attacks, and much more… • So what should we learn from this? • “Systems” really does cover a lot of stuff • It changes over time as the world changes (e.g. lots on multi-core in recent years) • We need to read a lot to get up to speed

This Course • Aims to start of the process of reading systems research papers • We will cover a lot, but still a tiny fraction of the whole space – you’ll need to do that in your “spare time” over the rest of your career! • The most important thing – and the primary goal of this course – is to understand how to read a systems paper…

Critical Thinking • Reading a research paper (and particularly a systems one) is not like reading a text book • For lots of reasons… • But the most important one is that the paper is not necessarily “the truth” • There’s no right and wrong, just “good” and “bad” • These are inherently subjective qualities… but you can’t get away with just your opinion: must argue. • Critical thinking is the skill of marrying subjective and objective judgment of a piece of work.

An Example: ApacheML • A researcher builds a web server from a type-safe language (in this case, Ocaml) • They argue that this will make the software less vulnerable to bugs (null pointers, buffer overruns, format violations, etc) • They build a prototype and compare it to Apache; the prototype adds 10% latency, and scores only 20% less well on SpecWEB • A “good” piece of research / a good system?

First let’s argue for… • What’s the problem? • Existing systems software (e.g. web servers) are buggy and insecure • Why is it important? • Security vulnerabilities cost billions of dollars every year! • Why isn’t it solved by previous work? • Many such vulnerabilities come from exploits which target code written in unsafe languages such as C or C++; yet people continue to write systems in these languages because of performance • Traditional testing and code review doesn’t catch all the bugs • What’s the approach? • Use a modern type-safe language • Why is this novel/innovative? • Previous language run-times were super slow; but this work leverages new compiler techniques and run-time support (which we built, and will explain) to maintain high performance

And now against :-) • Problem is overstated (or “oversold”) • Security vulnerabilities do cost money, sure, but the ‘billions’ is across the board, and includes viruses, worms, phishing, etc – buffer overruns in the web server is a tiny part of the problem • Problem doesn’t exist • Apache, IIS etc have been battletested and all known bugs fixed; this is the only way to be sure anyway (e.g. runtime bug?) • Approach is broken • Only 30 people in the world can write ocaml (and only 10 can read it!) • Solution is insufficient: • Dude, 10-20% hit on performance is not acceptable! • Evaluation is unfair/biased: • Apache includes support for X, Y, Z – your prototype is a just a toy! • (And if you fixed this, your performance would be worse!)

So which is the “right” answer? • There isn’t one! • All (or most) of these arguments are (mostly) correct… • So it’s chiefly a question of which ones ‘feel’ more valid to you – a complex and subjective thing • In this course, we’ll be reviewing a selection of 20 papers (3 per week, save for week 3) • Cover all sorts of topics, and span 1982 to 2011 • All of these were peer-reviewed and published, so should assume at least some merit there… • However you get to decide whether you like or dislike the paper, and make arguments either way (or both!)

Hints for Computer System Design • You may have already seen this on the web page... • If you’ve not read it already, I strongly recommend you do (at some stage). • Basically a collection of “wisdom” from one of the top systems guys in the world (Lampson) • It’s not a typical systems research paper • The problem identified is vague; the solutions are general; and there’s no evaluation!

Key Insights from the Paper • Designing a system is not like designing an algorithm: • Much less well specified (only have general requirements), so huge amount of freedom • And much more difficult to measure success • (these are the main reasons ‘right’ and ‘wrong’ don’t work for systems; need critical thinking) • And often have fundamental tradeoffs: • Simplicity versus Functionality • Performance versus Robustness • Throughput versus Latency

Some “Hints” are now [in]Famous • Most famous is “the end-to-end principle” • Often used as a argument to justify design decisions for the Internet • Actually is more like a “holy bludgeon” • Used by e2e zealots to argue against anything; just need to choose suitable ends! • In practice, it’s still a useful principle, but cannot be followed blindly • (Same is true for all the other hints in the paper)

Using the Hints • In general, these are most useful when thinking about a system design • Will be most useful for your future research projects and essays, etc • For this course, you’ll mostly be looking at other people’s systems… • … so another way of using the hints is as a set of questions you can ask about the paper. • Not always possible: often you won’t have enough information about the implementation

So how do you read/review a paper? • Do a first pass (5-10 mins): read title, abstract, intro and conclusions. Aim to get a general idea of the paper. • Next, sit down with a pen, and start reading • make notes (‘!’, ‘huh?’, full sentences, …) as you go • Try to identify the following key things: • What is the problem? • What is the solution / approach? • How does it compare with previous work? • (How well) Does the system work? • Most of the above should be fairly objective (i.e. most people should get similar answers)

Now for the fun bit • After reading the paper, decide if you like it • Make a judgment! • Do this immediately after finishing reading the paper (write a few sentences on the last page) • Now put the paper aside, and take a break (or go on to the next one) • Finally: write up your review (in < 1000 words) • You must use the form on the course web page • The idea is to try and capture both your objective and your subjective responses to the paper…

Parts of the Review • 1. Paper Summary (no more than 250 words) • Provide a brief summary of the paper (3-5 sentences) • The aim is to prove you’ve read (and understood!) the paper, so try to paraphrase and extract the essentials. • At this stage you should try to be objective • 2. The Problem • What is the problem? Why is it important? Why is previous work insufficient? • (1 or 2 sentences for each answer should be sufficient) • Once more you should try to be objective, i.e. report what the authors say in the paper.

Parts of the Review • 3. The Solution or Approach • What is their approach? How does it solve the problem? How is the solution unique and/or innovative (if it is)? What are the details? • Again rely on the paper itself to answer these; but don’t just regurgitate it! Paraphrase & synopsize. • Usually 5-10 sentences will be enough. • 4. Evaluation • How do they evaluate their solution? What questions do they answer? What are the strengths / weaknesses of (a) the system? (b) the evaluation itself? • Aim for 3-4 sentences here.

Parts of the Review • 5. What do you think? • Here you finally get to explain your opinion! • You should aim to give a ‘judgment’ on the work (and on the paper); and you should attempt to back it up with arguments (logical or rhetorical). • This should be at least 3 sentences, but can be more as required (subject to the total word limit) • 6. Questions for the authors • List one or two questions you’d like to ask

Reviewing Tips & Tricks • While reading, you need to absorb what the paper says, but try also to ask yourself: • Is this really true? • Does this argument make sense? • Does this evaluation really support the claims? • This is not about critical, not negative • Be prepared for the paper to be wrong, but don’t assume it is • (Just like you shouldn’t assume it’s right!) • This will take practice, but will get easier over time (and for topics you’re more expert in)

Presentations • As if reviews were not enough, each of you will also do some presentations in this course! • (In fact, from next week most of the time will be you up here presenting, and not me  • Each presentation should be 15-20 minutes long, and should be given using a computer • You can use your own laptop, or bring a USB stick with your powerpoint or PDF file • You can revise your presentations after you’ve given them, and then you submit the final versions after the end of the course.

Structure of a Presentation • You need to cover three things: • What is the background/context of the paper: what motivated the authors? What else was going on in the research community at the time? How have things changed since? • What does the paper actually say? What’s the problem they tried to solve? What are the key ideas? What did the authors actually do? What were the results? • What do you think about the paper? What’s good and what’s bad? What are the key takeaways? What was the impact (or what is the likely impact)?

As if this wasn’t hard enough… • Each presentation assignment also specifies a certain “flavor”: Advocate, Critic or Balanced • All should follow the structure described but: • An Advocate should emphasize the good points, and spend less on the negatives; in essence you are trying to take the role of the original authors, and convince people of the paper’s merits • A Critic should still present the work fairly, but towards the end focus on the negative aspects: essentially try to convince that the paper’s no good! • A Balanced presentation should try to cover both the good and the bad (but still arrive at some judgment).

Why do Presentations? • The aims of you doing presentations are: • Learning to structure an argument (even one you don’t believe in!) - you have no choice over whether you get to argue for or against • Generating discussion: most papers will have people arguing for and against. • Getting you to go a bit more in-depth on some of the 20 papers: becoming a bit of an expert. • Practice for your future research career…

Presentation Guidance • Don’t spend too long covering the basics: remember, everyone should have read the paper • You should of course give a brief overview, not least to set up the rest of the talk • But don’t just “repeat” the paper in slide form, and don’t spend too long on the results • The aim is to generate discussion, so you need to “add value” over the paper itself: • Explore the arguments they make, and the conclusions they draw. What do you think? • Make sure you at least try to match the specified “flavor”, even though it may be challenging.

Doing the Presentation • Practice beforehand to ensure it’s 15-20min • Most of you will be nervous: that’s normal! • Remember this is a friendly group of people in a closed room… and everyone’s in the same boat • Think of the presentation as a discussion/dialogue between you and the audience • (& practice beforehand to help settle your nerves) • Try not to get defensive or angry at questions • This is not your paper, it’s just a class ;-)

Being in the Audience • You’re not just sitting there: you need to get involved! • To kick things off, you’ll ask the questions from your reviews, so be sure to bring a copy with you • Everyone should participate in discussion • Always be respectful of the speaker • Academics (and systems researchers in general) can get quite passionate about arguments • This is good, but needs to be about the arguments and the material, not about the individual • (and remember, you’ll be up there one week ;-)

Grading • 100% of the marks are for your reviews • You need to do at least one review every week, and a total of 10 (if you do >10, we take the best) • I recommend you read all three papers (at least in outline), and then write reviews for 1-2 of them • Aim to spend about 2-4 hours on each paper • (and a further 8 hours on each presentation) • Reviews are due on the Tuesday prior to the class • I will aim to grade them within 2 days, but don’t rely on getting them back before class • Each marked out of 100, with 60 being a passing grade.

Schedule • Total of 20 papers • Week 2 [13th Oct] = Kernels & VM • Week 3 [20th Oct] = BF workshop [2 papers] • Week 4 [27th Oct] = Virtualization • Week 5 [3rd Nov] = Bugs (Anil) • Week 6 [10th Nov] = Data Center Storage • Week 7 [17th Nov] = Data Intensive Computing • Week 8 [24th Nov] = Deterministic Parallelism

Final Matters • You can find the papers on the web page; your reviews need to be submitted by 4pm on Tuesday • Remember to bring a copy of at least your questions with you to the next class. • (You can work on presentations up to the last minute) • The SRG Seminars are on Thursdays at 4pm, in either FW26 or LT2: try to come along if you can! • The NetOS Group Meetings are Tuesdays at 1pm in FW11 • This talk, the papers, review forms, and other resources are on the course web page: • http://www.cl.cam.ac.uk/teaching/1112/R01 • Good luck!

Advanced Topics in Computer Systems (ACS, R01)