230 likes | 340 Vues
Capsule Placement in the Service Platform. Bhuvan Urgaonkar Timothy Roscoe Systems Group, Sprint ATL. S ervice Plat form: an overview. Processors. High speed interconnect. Management/Control Unit. Internet. Service Platform: Goals. Sell the platform’s resources
E N D
Capsule Placement in the Service Platform Bhuvan Urgaonkar Timothy Roscoe Systems Group, Sprint ATL
Service Platform: an overview Processors High speed interconnect Management/Control Unit Internet
Service Platform: Goals • Sell the platform’s resources • Manage the resources efficiently • Provide performance guarantees to customers • Start or stop services within minutes
Services and Capsules • Services: • web/game/streaming servers • service provider pays the platform • Capsules • Def: Component of a service that should run on a single node • e.g.: consider a replicated web server
Nucleus • Node specific control/management software: • Capsule creation, destruction • Health information (process liveness) • Resource parameters (memory, CPU, network bandwidth etc.)
Control Plane • Capsule Placement • Flow Placement • Node, network, service monitoring • Deployed Service Database • Billing
Outline of this talk • Service Platform: an overview • Quality of Service • Capsule Placement • Design of the Placement unit • Conclusions and future work
QoS Representations • Application level • e.g., 50 transactions per sec • Contract level • e.g., “something like a 300 MHz Pentium II” • Platform level • e.g., ? • Node level • e.g., weights, priorities etc.
Translation between QoS levels • Application level => Contract level • Application specific, customer’s problem • Contract level => Platform level • More a business problem • Platform level => Node level • OS dependent
Capsule Placement: Desirables • Maximize revenue! • Aware of the “importance” of services. • Overbooking. • Exploit known workload characteristics. • Adapt to changes in workload? • Fast.
Stages in hosting a service • Requirement specification • Placement • Deployment • Activation
Requirement Specification • Contract level representation • Many possibilities: 300 MHz PII, best effort or a CPU instruction token bucket. • Platform level representation • Must be uniform across the platform. • (rate, burst, ovb tolerance, arch, OS)
Translation to Node level • Reservation based scheduler • map (rate, burst) to (period, slice) • bigger burst => bigger period • Proportional share scheduler • burst ? • weight in proportion to rate • Priority based scheduler • no easy mapping
Placement • Find the set of feasible nodes • Compatible architecture and OS • No overbooking tolerances violated • Pick one node from this set • Best Fit • Worst Fit • Random Select • Close Overbooking
Placement: Example a b c capsules 30 20 10 nodes 30 20 10 10 N1 N2 N3 N4 One possible placement: (a, N1), (b, N2), (c, N3)
Deployment and Activation • Deployment: The process of preparing a capsule for execution on a node. • Why ? • e.g., need to download some files before starting • the control plane sends all information to deploy the capsule • Activation: Starting a deployed service
Capsule State Diagram deploying activating undeployed deployed active undeploying deactivating
Example Message Exchange Control Plane Nucleus deployed svc cap Instruct nucleus to deploy a capsule, start timer Starts deploying the capsule No response! Send again deployed svc cap Still deploying state svc cap deployed Done deploying, send status message Deployed before timeout, instruct nucleus to activate activated svc cap Starts activating the capsule . . .
Placement Unit Architecture Listen for new requests Dispatch Events Listen to nuclei Events due to new requests Events due to msgs from nuclei Messages from nuclei Event Queue Message Queue
Database Consistency • Transactions and exceptions • e.g: try: transaction_begin () deploy_service (svc): transaction_commit () except: transaction_abort ()
Performance • Time to compute placement: 1-2 sec => time to deploy usually much larger • Comparison of heuristics • experiments with following workloads • 1-3 capsules, CPU requirement 0-10%, wide range of overbooking tolerances • Random Select admitted most # services, Best Fit admitted least • But … more investigation needed
Summary • QoS representation for CPU requirements of services. • Implementation of placement unit. • Some simple experiments to deploy and activate services.
Unfinished ... • Experiments: • heuristics better suited to specific workloads. • Scalability and efficiency of the system. • Integration of placement unit with rest of the Control Plane • Handling various failures • Extend to multiple resources - much harder than a single resource!