Timing-Predictable Systems - Reconciling Predictability with Performance -

Timing-Predictable Systems- Reconciling Predictability with Performance - Lothar Thiele and Reinhard Wilhelm

Quantified: Time • Embedded controllers with hard real-time characteristics must be guaranteed to finish their tasks within deadlines. • (static) Schedulability test must be performed. • needs (upper) bounds on the execution times of all tasks • Timing Predictability provides for precise bounds

Assumptions • Aiming at guarantees, i.e. need to consider all executions • Achieve Predictability not at (considerable) loss of Performance  completely (locally) deterministic systems are not the alternative • Systems too big for exhaustive approaches • Analytical approaches necessary

Variability of Execution Times • is at the heart of timing unpredictability, • is introduced at all levels of granularity • Memory reference • Instruction execution • Function • Task • Distributed system of tasks • Service

LOAD r2, _a LOAD r1, _b ADD r3,r2,r1 Access Times x = a + b; MPC 5xx PPC 755

Timing Accidents and Penalties Timing Accident – cause for an increase of the execution time of an instruction Timing Penalty – the associated increase • Types of timing accidents

Deriving Run-Time Guarantees • Static Program Analysis derives Invariants about all execution states at a program point. • Derive Safety Properties from these invariants : Certain timing accidents will never happen.Example:At program point p, instruction fetch will never cause a cache miss. • The more accidents excluded, the lower the upper bound. • (and the more accidents predicted, the higher the lower bound).

History-Sensitivity of Execution Times- Problem and Chance - Contribution of the execution of an instruction to a program‘s execution time • depends on the execution state, i.e., on the execution so far, • can be bounded if strong invariants about all execution states at this instruction are available.

lowerbound bestcase worstcase upperbound t interference influences causes non-determinism design design limitedanalysis limitedanalysis analysis techniques analysis techniques Bounds, Guarantees and Predictability design

Worst-Case Predictability Best-Case Predictability Worst-case guarantee Lower bound Upper bound t Worst case Best case Basic Notions Uncertainty x Penalties • Message: • make systems analysable • control the penalties

200 Some published Results cache-miss penalty 60 25 30-50% 20-30% 15% 15-25% over-estimation 7-8% 4 2007 2002 2005 1995 Lim et al. Thesing et al. Souyris et al. Tan

System Characteristics and Degrees of Overestimation • Airbus A380 code: • real code, synthesized from SCADE • complex processor, PPC 755 • 15 – 25% overestimation • ARTIST2 WCET Tool Challenge • small benchmark programs • simple processors, ARM7 • 7 – 8% overestimation

The Goal Time predictability + performance = • minimize upper bound – lower bound • minimize WCET

Compiler responsible: EPIC/VLIW Scratchpad memory Properties: Large focus Static information Complex algorithms: Heuristics required Heap Predictability Processor responsible: Superscalar Caches Properties: Small focus Dynamic information Complex hardware: High energy costs Adaptability Compiler vs. Processor – an old battle

Troublesome Architectural Features • Interference between architecture components • Branch prediction – instruction cache • Shared resources • Unified caches • Register overlays • Implicit actions (memory mapped registers) • Non-predictable variability • Memory access • Operation timing • Concurrency in combination with shared resources • Superscalarity • Out-of-order execution • Multi-threading (dyn. scheduled)

Penalties for Memory Accesses(in #cycles for PowerPC 755) Remember: Penalties have to be assumed for uncertainties! Tendency increasing, since clocks are getting faster faster than everything else

Cache Impact of Language Constructs • Pointer to data • Function pointer • Dynamic method invocation • Service demultiplexing CORBA

Cache Analysis How to statically precompute cache contents: Must Analysis:For each program point (and calling context), find out which blocks are in the cacheevery time program execution reaches this program point (through this context)

Must-Cache Information Must Analysis determines safe information about cache hitsEach predicted cache hit reduces the upper bound

“young” s z y x Age “old” s z x t z s x t { s } { x } { t } { y } { x } { } { s, t } { y } [ s ] Cache with LRU Replacement: Transfer for must concrete z y x t [ s ] abstract

{ a } { } { c, f } { d } { c } { e } { a } { d } “intersection + maximal age” { } { } { a, c } { d } Cache Analysis: Join (must) Join (must) Interpretation: memory block a is definitively in the (concrete) cache => always hit

{ } { x } { } {s, t } { x } { } { s, t} { y } Cache with LRU Replacement: Transfer for must under unknown access, e.g. unresolved data pointer Set of abstract cache [ ? ] If address is completely undetermined, same loss and no gain of information in every cache set! Analogously for multiple unknown accesses, e.g. unknown function pointer; assume maximal cache damage

Dynamic Method Invocation • Traversal of a data structure representing the class hierarchy • Corresponding worst-case execution time and resulting cache damage • Efficient implementation [WiMa] with table lookup needs 2 indirect memory references; if page faults cannot be excluded: 2 x pf = 4000 cycles!

System Layers • Distributed Operation • Inter-Task Level • Intra-Task Level • Hardware Platform Cross-LayerDependencies

System-Level Performance Methods e.g. delay Worst-Case Best-Case Real System Measure-ment Simulation Analysis

Difficulties ab acc b Input Stream Task Communication Task Scheduling Complex Input: - Timing (jitter, bursts, ...) - Different Event Types

Processor Task ab acc b Buffer Difficulties Input Stream Task Communication Variable Resource Availability Task Scheduling Variable Execution Demand - Input (different event types) - Internal State (Program, Cache, ...) Complex Input: - Timing (jitter, bursts, ...) - Different Event Types

Why is Performance Analysis of Distributed Systems Difficult? • non-deterministic environment- unpredictable input streams- data dependent behavior • interference between concurrent actions- multiple applications- sharing of limited resources- scheduling/arbitration mechanisms • local non-determinism- long-range dependencies- adaptive behavior (control loops)

Case Study - Opportunities S1 6 Real-Time Input Streams - with jitter - with bursts - deadline > period 3 ECU’s with own CC’s 13 Tasks & 7 Messages - with different WCED 2 Scheduling Policies - Earliest Deadline First (ECU’s) - Fixed Priority (ECU’s & CC’s) Hierarchical Scheduling - Static & Dynamic Polling Servers Bus with TDMA - 4 time slots with different lengths (#1,#3 for CC1, #2 for CC3, #4 for CC3) S2 ECU1 CC1 S3 S6 CC3 ECU3 BUS S4 ECU2 CC2 S5 Total Utilization: - ECU1 59 % - ECU2 87 % - ECU3 67 % - BUS 56 %

The Distributed System... ECU1 CC1 BUS (TDMA) S1 CC3 ECU3 FP FP S1 FP FP T1.1 PS C1.1 T1.2 PS T1.3 S2 T2.1 C1.2 EDF S3 T3.1 T2.2 C3.2 S3 T3.3 FP PS S6 T6.1 C2.1 T3.2 S6 C3.1 T4.2 ECU2 CC2 FP T5.2 C4.1 S4 T4.1 C5.1 S5 T5.1

Input of Stream 3 ECU1 BUS CPU ECU3 CPU TDMA PS PS CC1 S1 T1.1 C1.1 T1.2 C1.2 T1.3 EDF S2 T2.1 C2.1 T2.2 CC3 PS C3.1 S3 T3.1 T3.3 C3.2 T3.2 S6 T6.1 ECU2 CPU CC2 T4.1 S4 C4.1 T4.2 T5.1 T5.2 S5 C5.1

Output of Stream 3 ECU1 BUS CPU ECU3 CPU TDMA PS PS CC1 S1 T1.1 C1.1 T1.2 C1.2 T1.3 EDF S2 T2.1 C2.1 T2.2 CC3 PS C3.1 S3 T3.1 T3.3 C3.2 T3.2 S6 T6.1 ECU2 CPU CC2 T4.1 S4 C4.1 T4.2 T5.1 T5.2 S5 C5.1

Output with Greedy Shapers ECU1 BUS CPU ECU3 CPU TDMA PS PS CC1 S1 T1.1 C1.1 T1.2 C1.2 T1.3 EDF S2 T2.1 C2.1 T2.2 CC3 PS C3.1 S3 T3.1 T3.3 C3.2 T3.2 S6 T6.1 ECU2 CPU CC2 T4.1 S4 C4.1 T4.2 T5.1 T5.2 S5 C5.1

Open Cross-Layer Issues • Does it make sense to use preemptive-scheduling (intra task-level non-determinism increases, scheduling efficiency increases) ? • Uncoordinated scheduling (static and dynamic scheduling) • Distributed! control on several layers (control loops, adaptive behavior)

New Threats • Trend towards adaptive systems • adapt to varying processing/communication loads • adapt speed /switch off units for energy saving • multiple levels of control and estimation! • Increases long-range timing dependencies with non-deterministic behavior

System Layers • Hardware • Compiler • Task level (cf. talk offered by Sebastian Altmeyer) • Distributed operation Layering Principle: Separation of Concerns

Separation of Concerns • is the Design Principle • Virtualization & Abstraction are the means: • One processor is virtualized as often as there are tasks • Limited physical memory is abstracted to almost unlimited virtual memory • Time is abstracted to #transitions of some very abstract model or even orders of magnitude • Services are abstracted from their actual location by middleware Very successful, but a disaster for predictability!

Increasing Predictability • Architecture: reducing penalties, identifying architectures offering a good combination of predictability with performance • System layers: Resource-aware abstraction with resource interfaces • Development process: reducing uncertaintyMatching design with tools

Resource-aware Abstraction with Resource Interfaces • Importing resource constraints into a layer • Slot assignment or available bandwidth for communication • Bounding resource consumption by design • RT CORBA limits service demultiplexing • Exporting information about resource consumption • Real-Time Scheduling needs upper bounds on tasks’ execution times and context-switch costs

Dynamic System – Static Provisioning

Architecture • Scratchpad memory • LRU caches • Statically Scheduled multi-threading • Parallelism instead of speculation • Static decisions instead dynamic decisions • Dealing with resources • Based on history

Predictability of Memory Systems no cache scratchpad fully predictable SW-contr. cache partially frozen PRLU cache cache with LRU PRLU cache cache with FIFO, random unpredictable fully dynamic fully static cf. talk offered by Jan Reineke

A New Research Agenda • Architecture design: Beyond EPIC • Programming languages/constructs • Schedulability analysis for distributed systems • Predictable real-time middleware

Timing-Predictable Systems - Reconciling Predictability with Performance -

Timing-Predictable Systems - Reconciling Predictability with Performance -

Presentation Transcript

Timing Analysis and Timing Predictability Reinhard Wilhelm Saarbrücken

Nutrient Timing for Peak Performance

Algorithm Timing and Performance Issues with emphasis on HLT algorithm online timing

Predictable Development of Reliable Embedded Systems

Reconciling Systems, Software, and other Architectures

Crescando : Predictable Performance for Unpredictable Workloads

Predictability: The Essence of Attacking Systems

Timing-Predictability of Cache Replacement Policies

Assertion with Aspect (about Predictability)

Timing of digital systems

MISE: Providing Performance Predictability in Shared Main Memory Systems

Persistence Working Group “More Performance, Predictability, Predictability”

Predictable Assembly with SaveCCT

Timing Analysis and Timing Predictability

Human Animation for Interactive Systems: Reconciling High-Performance and High-Quality

Designing Predictable and Robust Systems

A Precision Timed Architecture for Timing Predictability and Repeatability

Predictable Performance Optimization for Wireless Networks

Predictable Unpredictability: Thoughts on Reconciling Model Results With Actual Experience

Communicating in Systems with Heterogeneous Timing

Timing Programs and Performance Analysis

Timing with loran