Peter van der Veen QNX Software Systems

Designing High-Performance Network Elements Using Multiprocessing Technologyand Adaptive Partitioning Peter van der Veen QNX Software Systems

Typical Hardware Architecture Chassis Network Network Network Line card Low-speed bus High-speed interconnect Control card Network ... Network

Typical Netcom System Software Constraints • Many millions of lines of code • Tens to hundreds of S/W components • Hundreds to thousands of processors and threads • Strict availability requirements Kernel (RTOS) Device Driver Device Driver Device Driver SS7 Stack TCP/IP Stack Application Application Filesystem

Software Architecture • Multiple processors sharing common hardware • Common memory bus and address space • Access to all peripheral devices and interrupts • OS manages tasks running on processors – true concurrency • Transparent to application programs • No incremental hardware • No application software changes needed FILE SYSTEM ETHERNET DRIVER ROUTE MANAGER Thread C Thread D Thread A Thread E Thread B QNX NEUTRINO REALTIME SCHEDULER (OS) Thread A Thread E Thread B Thread C Thread D PRIORITY Thread B CPU CPU CPU CPU CACHE CACHE CACHE CACHE HIGH-BANDWIDTH CPU BUS MEMORY

Symmetric Multiprocessing

SMP Memory Organization Apps "A" e600 core0 OS OS Apps "A" MMU Shared memory Apps "A" The OS kernel resides at physical memory address 0, addressable by both cores The MMU relocates applications and shared memory appropriately OS Apps "B" e600 core1 OS Shared memory Apps "B" Physical memory MMU Shared memory Apps "B"

Making the Most of SMP • Concurrency … divide and conquer • Write software components using threads • Remove serializations from dataflow • Caches … keep them hot • Minimize writes to globally shared data • Process data on the same processor where possible • Scheduling … get your ducks in a row • Take advantage of the OS scheduler • Use diagnostic tools to adjust runmasks and priorities

SMP Optimizing Tools • System Profiler • Provide a timeline view of activity in the system • Identify resource contention and serialization • Analyze SMP scheduling thrashing • Visualize distributed message passing • CPU Performance Counters • Count operations such as cache misses • Statistically sample based on significant events

Adaptive Partitioning

Introducing Adaptive Partitioning • What is Adaptive Partitioning? • Adaptive partitioning is a new QNX product that extends the Neutrino RTOS • Allows you to build secure compartments or “partitions” around a set of applications or threads • Partitions enforce CPU guarantees for applications, controlled by easy to use budgets • Why is it Adaptive? • Patent-pending design ensures all available CPU cycles are given to partitions that need processing time – no CPU cycles wasted • Provides performance advantage by permitting full processor utilization to accommodate spikes in demand • Easy to get started • No changes to how designers work today • POSIX programming model for the same, familiar design, programming & debugging techniques • No code changes are required to implement partitions

Understanding “Adaptive” Management Interfaces (CLI, SNMP) Routing & Forwarding Maintenance 70% 10% 20% 5% 90% 5% Processing Load Scenarios Idle Time 80% 10% 10% 5% 95%

Defining Partitions Management Routing & Forwarding Maintenance QNX Neutrino micro-kernel Maintenance Management Interface Routing & Forwarding 75% 20% 5% Given the processing scenarios, choose a partitioning approach and appropriate partition budgets

Adaptive: Budgets enforced when CPU is loaded Adaptive: Applications can use free CPU time if available from other partitions CPU Time wasted when partitions do not consume their budget. Applications cannot benefit from available time. Understanding “Adaptive” Partitioning QNX Neutrino Microkernel Management Interface Routing & Forwarding Maintenance 5% 75% 20% 75% 5% 20% 75% 20% 5% 10% 5% 75% 10% 10% 85% 5% 10% 5% 10% 10% 80% 10% 10% 75% 95% 5% 5% 75% 20% Static Adaptive

Uses for Adaptive Partitioning

Security Threats • Embedded systems are becoming network connected • Untrusted interfaces and network threats • Untrusted add on software • If appropriate measures are not included by design, your product’s security and availability can be compromised • Rogue software can launch denial of service (DOS) attack and starve core applications of CPU time • Need to ensure untrusted, add-on software can be contained to guard against attacks • Distributed DOS attacks can busy your system with network processing Rogue add-on stealing CPU time QNX Neutrino Microkernel File System Device Drivers Core Application Add-On Core Application Networking stack hogging CPU time Networking Core Application Add-On

Core Application Core Application Core Application Partitioning to Contain Threats • Create OS enforced partitions to ensure critical system resources are protected • Ensure CPU available for core functions • Partition inheritance ensures applications get CPU time for OS services (such as drivers, file systems, networking) • Contain threats and protect core applications • Limit impact of rogue applications Networking Consuming CPU Time QNX Neutrino micro-kernel Rogue add-on thwarted File System Add-On Device Drivers Networking Add-On

How Adaptive Partitioning Works

T= -100ms T= now Partition Accounting • What does “30% CPU Budget” mean? • CPU usage is calculated over a sliding window. • Partition budget guaranteed percentage of cpu time, balanced over sliding window • Partition usage CPU time executed, during last sliding window, expressed as percentage • Accuracy • Counting ticks is not enough. “Micro-billing” is used to track actual CPU utilization even when threads don’t use their whole timeslice • Micro- and nano-second resolution • Threads are billed based on real usage, not statistics • “windowsize” is configurable as an argument to kernel at boot • Tradeoff maximum READY-state latency with accuracy of CPU budgeting • 100ms window -> 1% accuracy or better. • Internal arithmetic accurate to 0.5% or better QNX Neutrino Microkernel User Interface Route Calculation Diagnostics Data Acquisition 30% 40% 30%

Behavior During Normal Load Blocked Ready 6 6 6 7 11 8 10 10 4 9 9 CPU Budget Available Running CPU Budget Available • Hard real-time, priority based scheduler under normal load • Running thread selected as highest priority READY thread • No delay on scheduling if adaptive partition has budget

Behavior During Overload Blocked Ready 6 6 6 7 11 8 10 4 9 CPU Budget Exceeded CPU Budget Available Runs before higher priority Ready – No Budget • Partition budgets are enforced when the CPU is fully loaded • Highest priority READY thread in partition with budget runs • No delay on scheduling if partition has budget

10 9 Behavior with Free CPU Time Blocked Blocked Running Ready 6 6 6 6 10 7 11 8 8 4 10 10 9 CPU Budget Exceeded CPU Budget Exceeded CPU Budget Available • If no partitions with remaining budget have READY threads, highest priority READY thread is selected to run from other partitions • This allows “free” time to be given based upon priority • “Free” time is still accounted and may have to be paid back (for example, if partition 3 becomes ready within 1 averaging window)

Inheritance: File System operation uses application’s budget Partition Inheritance • When a server process does work requested by a client, the time is “billed” to the client • Prevents runaway client processes from monopolizing system services such as device drivers and server processes • Ensures fair CPU scheduling • Allows you to create servers and assign server budgets independent of number of clients • Builds on Neutrino micro-kernel and client-server, message passing architecture File System Application Threads QNX Neutrino Microkernel Threads

Borrowed Time: Critical Threads Ready Blocked 6 6 7 8 11 30 30 6 Critical Thread 11 11 4 Running CPU Budget Available CPU Budget Exceeded • Critical threads still run (based on priority) even if partition has no budget • Critical threads provide deterministic scheduling even in overload • Critical threads are given critical budget and can go into short-term debt • Critical time is accounted and has to be repaid • Exceeding critical budget is considered an error and causes notification/action

Adaptive Partition APIs and Utilities • Control of Adaptive Partitioning Scheduler is done through a kernel API • API is restricted to privileged processes (root) • Must be called from within default (system) partition • Partitions are created with budget (normal and possibly critical) • “aps” system utility provided • “aps” utility part of adaptive partitioning package • Can be used to create and modify partitions • Also provides usage stats over time • Use “on” to launch processes into partitions • Boot script syntax extended • Define partitions within the build file • Launch processes into specific partitions • Partition configuration completely dynamic • Can create partitions, modify budgets at runtime • Averaging window can also be changed at runtime

CODE CHANGES POSIX PROGRAMMING ALLOWED Getting Started with Adaptive Partitioning Step 1 Step 2 Step 3 Step 4 Install Adaptive Partitioning Build Image Define Partitions and Budgets Launch Applications In Partitions

Summary • SMP is a key enabler for enhancing scalability • SMP delivers measurable performance gains in real-world applications • QNX provides transparent support for SMP systems • Adaptive partitioning can be used to increase your systems security and availability • Adaptive partitioning is easy to apply to existing designs and implementations • Adaptive partition helps you integrate complex systems to improve time to market

Peter van der Veen QNX Software Systems

Peter van der Veen QNX Software Systems

Presentation Transcript

Signals_and_Systems_Simon Haykin Barry Van Veen

P. R. van Oel, A. van der Veen, R. Becht

QNX

Theo van Veen, Koninklijke Bibliotheek

Theo van Veen, Koninklijke Bibliotheek

Presentation Peter van der Baan

Lolke J. van der Veen DDL/Lyon 2

Real-Time Operating Systems - QNX

Prof. Jan Rotmans Pieter Valkering Prof. Anne van der Veen Jörg Krywkow

Peter van Dam

Theo van Veen, Koninklijke Bibliotheek