Maximizing GPU Performance for Real-Time Systems: Platform-Based Design Principles

Platform-based Design 5KK70MPSoC Controlling the Parallel Resources

Contents • GPUs • PicoChip • Real-Time Scheduling basics • Resource Management

GPU basics • Synthetic objects are represented with a bunch of triangles (3d) in a language/library like OpenGL or DirectX plus texture • Triangles are represented with 3 vertices • A vertex is represented with 4 coordinates with floating-point precision • Objects are transformed between coordinate representations • Transformations are matrix-vector multiplications

GPU DirectX 10 pipeline

NVIDIA GeForce 6800 3D Pipeline

GeForce 8800 GPU 330 Gflops, 128 processors with 4-way SIMD

GPU: Why more general-purpose programmable? • All transformations are shading • Shading is all matrix-vector multiplications • Computational load varies heavily between different sorts of shading • Programmable shaders allow dynamic resource allocation between shaders Result: • Modern GPUs are serious competitor for general-purpose processors!

Pico Chip

Fault-Tolerance

Pico Chip

Real-time systems (Reinder Bril) • Correct result at the right time: timeliness • Many products contain embedded computers, e.g. cars, planes, medical and consumer electronics equipment, industrial control. • In such systems, it’s important to deliver correct functionality on time. • Example: inflation of an air bag

Cable modem DVB Tuner IEEE 1394 interface RF Tuner CVBS interface YC interface VGA DVD CDx front end Example: Multimedia Consumer Terminals (by courtesy of Maria Gabrani)

up-scaled Example: High quality video & real time TV companies invest heavily in video enhancement,e.g. temporal up-scaling Input stream: 24 Hz (movie) original Rendered stream: 60 Hz (TV screen)

up-scaled displayed Example: High quality video & real time TV companies invest heavily in video enhancement,e.g. temporal up-scaling Input stream: 24 Hz (movie) original • Deadline miss leads to “wrong” picture. • Deadline misses tend to come in bursts (heavy load). • Valuable work may be lost.

Real-time systems • Guaranteeing timeliness requirements: • real-time tasks with timing constraints • scheduling of tasks • Fixed-priority scheduling (FPS) is the de-facto standard for scheduling in real-time systems. • FPS: supported by • commercially available RTOS; • analytic and synthetic methods.

Recap of FPS • Fixed Priority Pre-emptive Scheduling (FPPS) • A basic scheduling model • Analysis • Example • Optimality of RMS and DMS

FPPS: A basic scheduling model • Single processor • Set of n independent, periodic tasks 1, …, n • Tasks are assigned fixed priorities, and can be pre-empted instantaneously. • Scheduling: At any moment in time, the processor is used to execute the highest priority task that has work pending.

FPPS: A basic scheduling model • Task characteristics: • period T, • (worst-case) computation time C, • (relative) deadline D, • Assumptions: • non-idling; • context switching and scheduling overhead is ignored; • execution of releases in order of arrival; • deadlines are hard, and D T; • 1 has highest and n has lowest priority. • No data-dependencies between tasks

FPPS: Analysis • Schedulable iff:WRi Di for 1  i  n • Necessary condition: • Sufficient condition for RMS:ULL(n) = n (21/n – 1), i.e. ri >rj iff Ti < Tj;Di = Ti.

FPPS: Analysis • Otherwise, • i.e. U  1 and not RMS, or • n(21/n – 1) < U < 1 and RMS • exact condition: • Critical instant: simultaneous release of i with all higher priority tasks • WRi is the smallest positive solution of

FPPS: Example • Task set Γ consisting of 3 tasks: • Notes: • RM priority assignment and Di = Ti(RMS); • U1 + U2 + U3 = 0.97  1, hence Γcould be schedulable; • Utilization bound: U(n) LL(n) = n (21/n – 1): • U1+U2 = 0.88 > LL(2)  0.83, • therefore U(3) > LL(3), hence another test required.

1 2 3 4 5 6 1 2 3 time 0 10 20 30 40 50 60 WR1 = 3 WR2 = 17 WR3 = 56 FPPS: Example • Time line Task 1 Task 2 Task 3

FPPS: Optimality of RMS and DMS • Priority assignment policies: • Rate Monotonic (RM): ri >rj iff Ti < Tj • Deadline Monotonic (DM): ri >rj iff Di < Dj • Under arbitrary phasing: • RMS is optimal among FPS when Di = Ti; • DMS is optimal among FPS when DiTi, • where optimal means: if an FPS algorithm can schedule the task set, so can RMS/DMS.

Task Non-Preemptive Systems (Akash Kumar) • State-space needed is smaller • Lower implementation cost • Less overhead at run-time • Cache pollution, memory size

Why FPS doesn’t work for “future” high-performance platforms • Heavy-duty DSPs: Preemption not supported • If it was: Context switching is significant • Data-dependencies not taken into account • Multi-processor

Related Research – Feasibility Analysis Preemptive [Liu, Layland, 1973] B A D [Jeffay, 1991] Non-Preemptive C Homogeneous MPSoC [Baruah, 2006] P1 P2 P3 P4 P5 P6 [ , 2020??] Heterogeneous MPSoC

50 49 50 49 49 50 50 49 A A B B 50 49 49 50 Unpredictability – Variation in Execution Time P1 P2 P3

Problem No good techniques exist to analyze and schedule applications on non-preemptive heterogeneous systems Resource Manager proposed to schedule applications such that they meet their performance requirements on non-preemptive heterogeneous systems

B2 A2 D2 C2 Task Job Our Assumptions • Heterogeneous MPSoC • Applications modeled as SDF • Non-preemptive system – tasks can not be stopped • Jobs can be suspended • Lot of dynamism in the system • Jobs arriving and leaving at run-time • Variation in execution time • Very simple arbiter at cores

Application QoS Manager Application level few sec Reconfigure to meet above quality milliseconds Resource Manager B A Local Processor Arbiter Task level micro sec Core Resource Manager

Resource Manager Local Arbiter P1 P2 P3 Architecture Description • Computation resources available are described • Each processor can have different arbiter • In this model First Come First Serve mechanism is used • Resource manager can configure/control the local arbiters: to regulate the progress of application if needed

Resource Manager • Responsible for two main things • Admission control • Incoming application specifies throughput requirement • Execution-time and mapping of each actor • Repetition vector used to compute expected utilization • RM checks if enough resources present • Allocates resources to applications if admitted

Video Conf Play MP3 Typing Sms P1 Admission Control Resource Reqmt Exceeded! P2 P3

Resource Manager • Admission control • Budget enforcement • When running, each application signals RM when it completes an iteration • RM keeps track of each application’s progress • Operation modes • ‘Polling’ mode • ‘Interrupt’ mode • Suspends application if needed

Performance goes down! Better than required! Budget Enforcement (Polling) New job enters! Resource Manager job suspended! job resumed!

Experiments • A high-level simulation model developed • POOSL – a parallel simulation language used • A protocol for communication defined • System verified with a number of application SDF models • Case study done with H263 and JPEG application models • Impact of varying ‘polling’ interval studied

Performance without Resource Manager

Performance with RM – I (2.5m cycles)

Performance with RM – II (500k cycles)

Maximizing GPU Performance for Real-Time Systems: Platform-Based Design Principles

Maximizing GPU Performance for Real-Time Systems: Platform-Based Design Principles

Presentation Transcript

Platform based design 5KK70 MPSoC Platforms

Platform Design

Platform-Based Reconfigurable Computing Design

Platform Design

Platform-based Design

Lecture 17: Platform-Based Design and IP

Modeling of Architectures Platform-based Design 5KK70 Henk Corporaal Bart Mesman Hamed Fatemi 2010

Platform-based design

Platform-based Design 5KK70 MPSoC

Platform-based Design 5kk70

Platform Based Design for Wireless Sensor Networks

Platform Based Design Student presentations

Defining Platform-Based Design

Platform-based Design 5KK70 MPSoC

Platform Design

Platform-based Design

New Opportunities with Platform Based Design

Platform-based Design 5KK70 MPSoC

Platform-Based Design: Part 3, Applications

Platform Design

Platform based design

Modeling of Architectures Platform-based Design 5KK70 Henk Corporaal Bart Mesman Hamed Fatemi 2010