CPU Utilization Control in Distributed Real-Time Systems

CPU Utilization Control in Distributed Real-Time Systems Chenyang Lu Department of Computer Science and Engineering

Why CPU Utilization Control? • Overload protection • CPU over-utilization system crash • Nightmare for mission-critical applications and “always-on” E-businesses • Meet deadlines • CPU utilization < schedulable utilization bound

End-to-End Task Modelin Distributed Real-Time Systems • Periodic task Ti = a chain of subtasks {Tij} located on different processors • Subtasks run at a same rate • Task rate can be adjusted within a range • Higher rate  higher utility T1 T11 T12 T13 T3 T2 Remote Invocation Subtask P2 P3 P1

Problem Formulation • Bi: Utilization set point of processor Pi (1 ≤ i ≤ n) • ui(k): Utilization of Pi in kth sampling period • rj(k): Rate of task Tj (1 ≤ j ≤ m) in kth sampling period subject to rate constraints: Rmin,j rj(k)  Rmax,j (1 ≤ j ≤ m)

Challenge: Uncertainties • Execution times? • Unknown sensor data or user input • Request arrival rate? • Aperiodic events • Bursty service requests • Disturbance? • Denial of Service Attacks Control-theoretic approaches to adaptive software • Robust performance in face of workload uncertainty

Single-Processor Solution:Feedback Control Real-Time Scheduling • Adaptation based on single-input-single-output control Sensor Inputs FCS {r(k+1)} Application Set point Us = 69% Task Rates R1: [1, 5] Hz R2: [10, 20] Hz Controller Actuator Middleware u(k) OS Monitor Processor C. Lu, X. Wang, and C. Gill, Feedback Control Real-Time Scheduling in ORB Middleware, IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'03), May 2003.

T1 T11 T12 T13 T3 T2 P2 P3 P1 What’s New in Distributed Systems? • Need to control utilization of multiple processors • Utilization of different processors are coupled with each other due to end-to-end tasks • Replicating FCS on all processors does not work! • Constraints on task rates

EUCON: Multi-Input-Multi-Output Control Measured Output Distributed System (m tasks, n processors) Utilization Monitor UM UM Model Predictive Controller Rate Modulator RM RM Feedback Loop Control Input Precedence Constraints Subtask C. Lu, X. Wang and X. Koutsoukos, Feedback Utilization Control in Distributed Real-Time Systems with End-to-End Tasks, IEEE Transactions on Parallel and Distributed Systems, 16(6): 550-561, June 2005.

Control Theoretic Methodology • Derive a dynamic model of the controlled system • Design a controller • Analyze stability

Dynamic Model: One Processor • Si: set of subtasks on Pi • cjl: estimated execution time of Til running on Pi • may not be correct • gi: utilization gain of Pi • unknown ratio between actual and estimated change in utilization • models uncertainty in execution times

Dynamic Model: Multiple Processors • G: diagonal matrix of utilization gains • F: subtask allocation matrix • models the coupling among processors • fij = cjl task Tj has a subtask Tjl on processor Pi • fij = 0 if Tj has no subtask on Pi u(k) = u(k-1) + GFr(k-1) T1 T11 T22 T3 T2 T21 T31 P2 P1

Model Predictive Control • Advanced control technique for coupled MIMO control problems with actuator constraints. • Minimize a cost function over an interval in the future. • Compute an input trajectory that minimizes cost subject to actuator constraints. • Predict cost based on a system model and feedback. • Combines optimization, model-based prediction, and feedback.

Model Predictive Controller At a sampling instant • Compute inputs in future sampling periods r(k), r(k+1), ... r(k+M-1) to minimize a cost function • Cost is predicted using (1) feedback u(k-1) (2) approximate dynamic model • Apply r(k) to the system At the next sampling instant • Shift time window and re-compute r(k+1), r(k+2), ... r(k+M) based on feedback

Model Predictive Controller in EUCON Constrained optimization solver Desired trajectory for u(k) to converge to B Difference with reference trajectory

Stability Analysis • Stability: system converges to equilibrium point from any initial condition • Equilibrium point = utilization set points B • If stable, utilization of all processors converge to their set points whenever feasible • Derive stability condition  tolerable range of G • tolerable variation of execution times • Stability analysis establishes analytical guarantees on utilization despite uncertainty

Simulation: Stable System execution time factor = 0.5 (actual execution times = ½ of estimates)

Simulation: Unstable System execution time factor = 7 (actual execution times = 7 times estimates)

Stability • Stability: system converges to desired utilizations from any initial condition • Derive stability condition tolerable range of execution times • Analytical assurance on utilizations despite uncertainty Overestimation of execution times prevents oscillation Predicted bound for stability actual execution time / estimation

Model Predictive Controller Measured Output Control Input Feedback lane Rate Modulator Rate Modulator Rate Modulator Priority Manager Priority Manager Priority Manager Utilization Monitor Utilization Monitor Utilization Monitor Remote request lanes Remote request lanes FC-ORB Middleware X. Wang, C. Lu and X. Koutsoukos, Enhancing the Robustness of Distributed Real-Time Middleware via End-to-End Utilization Control, IEEE Real-Time Systems Symposium (RTSS'05), December 2005.

Workload Uncertainty disturbance from periodic tasks time-varying execution times

Processor Failure • Norbert fails. • move its tasks to other processors. • reconfigure controller • control utilization by adjusting task rates

Summary: Model Predictive Control • Robust utilization control for distributed systems • Handles coupling among processors • Enforce constraints on task rates • Analyze tolerable range of execution times

References • Centralized control: EUCON • C. Lu, X. Wang and X. Koutsoukos, Feedback Utilization Control in Distributed Real-Time Systems with End-to-End Tasks, IEEE Transactions on Parallel and Distributed Systems, 16(6): 550-561, June 2005. • Decentralized control: DEUCON • X. Wang, D. Jia, C. Lu and X. Koutsoukos, DEUCON: Decentralized End-to-End Utilization Control for Distributed Real-Time Systems, IEEE Transactions on Parallel and Distributed Systems, 18(7): 996-1009, July 2007. • Middleware: FC-ORB • X. Wang, C. Lu and X. Koutsoukos, Enhancing the Robustness of Distributed Real-Time Middleware via End-to-End Utilization Control, IEEE Real-Time Systems Symposium (RTSS'05), December 2005. • Controllability and feasibility • X. Wang, Y. Chen, C. Lu and X. Koutsoukos, On Controllability and Feasibility of Utilization Control in Distributed Real-Time Systems, Euromicro Conference on Real-Time Systems (ECRTS'07), July 2007. • Project page: http://www.cse.wustl.edu/~lu/aqc.htm

CPU Utilization Control in Distributed Real-Time Systems