300 likes | 400 Vues
This paper discusses the TACC (Transformation, Aggregation, Caching, Customization) framework designed to optimize content delivery over the Internet based on user preferences and network constraints. It outlines key contributions of TACC, including a model for structuring Internet services, experiences from real users, and insights gained from implementing TACC at the University of California, Berkeley. The study emphasizes how a unified framework can improve app development, scalability, and ease of service writing while maintaining modularity and low costs.
E N D
TACC Retrospective:Contributions, Non-Contributions, and What We Really Learned Armando FoxUniversity of California,Berkeley fox@cs.berkeley.edu
Vision: “The Content You Want” What do above apps have in common? • Adapt (collect, filter, transform) existing content… • according to client constraints • respecting network limitations • according to per-user preferences • But: Lack of unified framework for designing apps that exploit this observation
Contributions • TACC, a model for structuring services • Transformation, Aggregation, Caching, Customization of Internet content • Scalable TACC server • Based on clusters of commodity PC’s • Easy to author “industrial strength” services • Scalable Network Service (SNS) platform maps app semantics onto cluster-based availability mechanisms • Experience with real users • ~15,000 today at UCB
What’s TACC? • Transformation (“local”, “one-to-one”) • TranSend, Anonymizer • Aggregation (“nonlocal”, “many-to-one”) • Search engines, crawlers, newswatchers • Caching • Both original and locally-generated content • Customization • Per user: for content generation • Per device: data delivery, content “packaging”
C T TACC Example: TranSend • Transparent HTTP proxy • On-the-fly, lossy compression of specific MIME types (GIF, JPG...) • Cache both original & transformed • User specifies aggressiveness and “refinement” UI • Parameters to HTML & image transformers $
T C Top Gun Wingman • PalmPilot web browser • Intermediate-form page layout • Image scaling & transcoding • Controlled by layout engine • Device-specific ADU marshalling • Including client versioning • Originals and device-specific pages cached html $ A ADU
Application Partitioning • Client competence • Styled text, images, widgets are fine • Bitmaps unnecessary • Client responsiveness • Scrolling, etc. shouldn’t require roundtrip to server • Client independence • Very late conversion to client-specific format
$ C W W W W W W T A TACC Conceptual Data Flow To Internet FE User request • Front end accepts RPC-like user requests • User’s customization profile retrieved • Original data fetched from cache or Internet • Aggregation/transformation workers operate on data according to customization profile
TACC Model Summary • Mostly stateless, composable workers • Unifies previously ad hoc applications under one framework • Encourages re-use through modularization • Composition enables both new services and new clients • TACC breakdown provides unified way to think about app structure
Services Should Be Easy To Write • Rapid prototyping • Insulate workers from “mundane” details • Easy to incorporate existing/legacy code • Few assumptions about code structure • Must support variety of languages • May be fragile • Composition to leverage existing code
Building a TACC Server • Challenge: Scalable Network Service (SNS) requirements • Scalability to 100K’s of users with high availability • Cost effective to deploy & administer • But, services should remain easy to write • Server provides some bug robustness • Server provides availability • Server handles load balancing and scaling • Preserve modularity (& componentwise upgradability) when deploying
Layered Model of Internet Services httpd, etc. • TACC Layer • Programming model based on composable building blocks • SNS Layer: “large virtual server” • Implements SNS requirements • Cluster computing for hardware F/T and incremental scaling TACC ScalableNetwork Svc • Exploit TACC model semantics for software F/T • SNS layer is reusable and isolated from TACC • Application “content” orthogonal to SNS mechanisms • Key to making apps easy to write
Why Use a Cluster? • Incremental scalability, low cost components • High availability through hardware redundancy Goals: • Demonstrate that clusters and TACC fit well together • Separate SNS from TACC
C FE $ $ $ FE W W W A Interconnect W W FE W GUI LB/FT T Cluster-Based TACC Server • Component replication for scaling and availability • High-bandwidth, low-latency interconnect • Incremental scaling: commodity PC’s User ProfileDatabase Caches Front Ends Workers Load Balancing &Fault Tolerance AdministrationInterface
W W W A Interconnect W W W T “Starfish” Availability: LB Death • FE detects via broken pipe/timeout, restarts LB C FE $ $ $ FE FE LB/FT
W W W A Interconnect W W W T LB/FT “Starfish” Availability: LB Death • FE detects via broken pipe/timeout, restarts LB • New LB announces itself (multicast), contacted by workers, gradually rebuilds load tables • If partition heals, extra LB’s commit suicide • FE’s operate using cached LB info during failure C FE $ $ $ FE FE LB/FT
W W W A Interconnect W W W T “Starfish” Availability: LB Death • FE detects via broken pipe/timeout, restarts LB • New LB announces itself (multicast), contacted by workers, gradually rebuilds load tables • If partition heals, extra LB’s commit suicide • FE’s operate using cached LB info during failure C FE $ $ $ FE FE LB/FT
Fault Recovery Latency Task queue length
Behavior in the Large • TranSend: 160 image transformations/sec = 10 Ultra-1 servers • Peak seen during UCB traces on 700-modem bank: 15/sec • Amortized hardware cost <$0.35/user/month (one $5K PC serving ~15,000 subscribers) • Wingman: factor of 6-8 worse • Administration: one undergraduate part-time
Building a Big System • Restartable, atomic workers • Read-only data from other origin server(s) • Orthogonal separation of scalability/availability from application “content” • Multiple lines of defense • App modules agree to obey semantics compatible with these mechanisms • Common-case failure behavior compatible with users’ Internet experience • Enables reuse of whole workers, however diverse
Availability & Scalability Summary • Pervasive strategy: timeout, retry, restart • Transient failures usually invisible to user • Process peers watch each other • Mostly stateless workers, xact support possible • Simplicity from exploiting soft state • Piggyback status info on multicast beacons • Use of stale LB info fine in practice • “Starfish” availability works in practice
Service Authoring • Keyword hiliting: < 1 day • Wingman: 2-3 weeks • Various apps from graduate seminar projects • Safe worker upload • Annotate the Web • “Channel aggregators”
New Services By Composition • Compose existing services to create a new one • ~2.5 hours to implement • Composes with TranSend or Wingman Internet TranSend Metasearch
Experience With Real Users • Transparent enhancements • Minimal downtime • Low administration cost • Multicast-based administration GUI • Virtually no dedicated resources at UCB • “Overflow pool” of ~100 UltraSPARC servers • Users don’t mind relying on middleware proxy
Why Now? • Internet’s critical mass • Commercial push for many device types (transistor curves) • Cluster computing economically viable • A good time for infrastructural services
Related Work • Transformational proxy services: WBI, Strands • Application partitioning: Wit, InfoPad, PARC Ubiquitous Computing • Computing in the infrastructure: Active Networks • Soft state for simplicity and robustness: Microsoft Tiger, multicast routing protocols
Summary of Contributions • TACC, a composition-based Internet services programming model • captures rich variety of apps • one view of customization • No-hassle deployment on a cluster • Automatic and robust partial-failure handling • Availability & scaling strategies work in practice • New apps are easy to write, deploy, debug • SNS behaviors are free • Compose existing services to enable new clients
Non-Contributions (a/k/a Future Work) Accidental contributions: • Legacy code glue • Cheap test rig for next project (prototyping path discovery; a bare bones “cluster OS”) Non-contributions: • Fair resource allocation over cluster • Built-in security abstractions • Rich state management abstractions
What We Really Learned • Design for failure • It will fail anyway • End-to-end argument applied to availability • Orthogonality is even better than layering • Narrow interface vs. no interface • A great way to manage system complexity • The price of orthogonality • Techniques: Refreshable soft state; watchdogs/timeouts; sandboxing
Future Work • TACC as test rig for Ninja • Taxonomy of app structure and platforms • What is the “big picture” of different types of Internet services, and where does TACC fit in? • Joint work with Dr. Murray Mazer at the Open Group Research Institute • Apply TACC lessons to building reliable distributed systems • Formalize programming model