120 likes | 317 Vues
Performance Analysis of a RTOS by Emulation of an Embedded System. June 17th, 1999 T. Steckstor, K. Weiß, W. Rosenstiel Lehrstuhl für Technische Informatik University of Tübingen D-72076 Tübingen, Germany e-mail: stecki@fzi.de. Outline. Introduction
E N D
Performance Analysis of a RTOS by Emulation of an Embedded System June 17th, 1999 T. Steckstor, K. Weiß, W. Rosenstiel Lehrstuhl für Technische Informatik University of Tübingen D-72076 Tübingen, Germany e-mail: stecki@fzi.de
Outline • Introduction • Emulation environment: SPYDER-CORE-P1 • Benchmark example: Actuator-Sensor-Interface (ASI) master unit • Embedded system performance analysis • Analysis results of different cache configurations and cache sizes • Conclusion
Introduction • Embedded systems in the industrial automation • Application specific hardware implementation using a FPGA • Application specific software running on a microcontroller • The interaction between the hardware part and the software part demands hard real-time requirements (reaction times of about 200µs) • Motivation from an embedded system designers point of view • Sophisticated software task architecture (RTOS) • Novel microcontroller architecture with caches • Fast reaction times to external events cause that task switching and interrupt reaction times become a major performance bottleneck
Emulation • Embedded system with complex internal system behavior • Emulation is very close to the final target system to get a detailed internal view • Emulation offers the possibility to find the best hw/sw partitioning early in the design process • Emulation gives answers to the following questions: • What is the optimum clock speed? • How much performance is consumed by the RTOS? • How great is the performance enhancement of the on-chip caches and what can be done with this enhancement? • What is the effect of different cache sizes on the important RTOS task switching and interrupt reaction times?
peripherie devices 8 Bit I/O bus extension headers Intra/ Internet FPGA architectures FLASH 8MB Ethernet 10MBit Actel add-on driver AT-ISA bus DPRAM 2KB I Xilinx XC6000 2 serial ports analog module Xilinx XC4000 II III Emulation environment: SPYDER-CORE-P1 microcontroller core 32 bit microcontroller bus DRAM 1-128MB Embedded PowerPC PPC403 25..80MHz CORE-P1 AT-ISA add-on board
ASI communication system up to 32 slaves ASI power supply ASI slave 4O 4O 4I 4I slaveanswer 0 I3 I2 I1 I0 PB 1 ASI slave ASI master 0 SB A4 A3 A2 A1 A0 I4 I3 I2 I1 I0 1 PB mastercall ASI real-time critical constant (220µs) Benchmark example: ASI master unit
ASI application sofware http- server FLASH 8MB from microcontroller register interface tele_receive int_service control C-server to VxWorks real-time operating system tele_send TCP/ IP analog module ASI-UART SPYDER-CORE-P1 hardware Ethernet 10MBit ASI hardware (single channel) Actel add-on Target chip: XC4005E, 166 CLBs, utilization: 85% DPRAM 2KB I Xilinx XC6000 2 serial ports Benchmark example: Implementation microcontroller core peripherie devices 32 bit microcontroller bus 8 Bit I/O bus extension headers Intra/ Internet FPGA architectures DRAM 1-128MB Embedded PowerPC PPC403 25..80MHz driver AT-ISA bus analog module Xilinx XC4000 II III CORE-P1 AT-ISA add-on board
Embedded system performance analysis int_reaction PPC403GA/33MHz ASI real-time critical constant (220µs) all caches disabled Int. 30 control task RTOS int_service task change semTake I/O 60 40 80 10 t µs 0 100 200 time used by RTOS time used by the application 50µs (23%) 170µs (77%)
Above 1.0 system is under-sized • Below 1.0 system is over-sized optimal WP • Optimal working point is 33MHz 1.0 without caches 40% • With I-2KB/D-1KB at the optimal WP 40% performance gain with I-2KB/D-1KB with I-16KB/D-8KB • With 8 times larger caches the performance gain at the optimal WP is 350% 33 Embedded system performance analysis • Real-time critical constant is 220µs real-time execution time (used) real-time critical constant (220µs) 1.5 0.5 clock frequency MHz 25 40 80
PPC403GA 33MHz (WP) PPC403GCX 33MHz (WP) without I-Cache without D-Cache without I-Cache without D-Cache without I-Cache with D-Cache without I-Cache with D-Cache with I-Cache without D-Cache with I-Cache without D-Cache with I-Cache with D-Cache with I-Cache with D-Cache task switching time task switching time 100% (87µs) 100% (87µs) -1% +10% +152% +46% +340% +60% interrupt reaction time interrupt reaction time 100% (27µs) 100% (27µs) +12% -4% +205% +50% +43% +377% dhrystones dhrystones 100% (6211) 100% (6211) +11% +10% +187% +207% +529% +455% Analysis results of different cache configurations
Conclusion • The optimal working point is at 33MHz • At the optimal working point 77% of the total execution time (220µs) is consumed by the RTOS • At the optimal working point small caches improve execution performance by 40%, larger caches provide an average gain of 350% • Such enhancements can only be used for non-real-time dependent system services, e.g. network communication via the internet • The cache sizes should be in a range of about 8-16KByte to provide a significant performance gain, if the application is running under the control of a RTOS