Lower Power Embedded Architecture Design 성균관대학교 조 준 동 교수, 1999. 8 http://vada.skku.ac.kr SungKyunKwan Univ.
Contents • Embedded Systems • Design and Optimization of ASIP (Application Specific Instruction Processor) • Hardware and Software Codesign • Reconfigurable Processors • Ultra-Low-Power Domain-Specific MULTIMEDIA PROCESSORS • Reconguration for Power Savingin Real-Time Motion Estimation • Kernel Scheduling in Reconfigurable Computing SungKyunKwan Univ.
Low Power MPU SungKyunKwan Univ.
Levels for Low Power Design SungKyunKwan Univ.
Present- Day Digital Systems • Current systems are complex and heterogenous Contain many different types of components • Programmable and Re-configurable processors • Application- specific integrated circuits (ASICs) • Application-specific Instruction processor (ASIP) • Read- Only Memory (ROM) and RAM • I/ O devices and circuitry • Typically designed from a (large) software specification • These heterogenous systems are called embedded systems SungKyunKwan Univ.
Embedded System Characteristics • Limited user programmability • Completely transparent to user, e. g. automotive engine control • Limited user interface e. g., intelligent telephones • Programmable through application specific language e.g.,postscript printer • Real- time response No batch processing SungKyunKwan Univ.
Embedded Systems: Products - 1 • Consumer Electronics • HDTV • CD player • video game • video tape recorder • programmable TV • camera • music system Computer Related personal digital assistant printer disc drive multimedia subsystem graphics subsystem graphics terminal • Communications • cellular phone • video phone • fax • modems • PBX SungKyunKwan Univ.
Embedded Systems: Products - 2 • Control Systems • Automotive:engine, ignition, brake system • Manufacturing process control: robotics • Remote control: satellite control, spacecraft control • Other mechanical control: elevator control • Office Equipment • smart copier, printer, smart typewriter, calculator • point- of- sale equipment,credit- card validator,UPC code reader, cash register • Medical Applications: instruments( EKG, EEG), scanning, imaging SungKyunKwan Univ.
Problem Domain Shift SungKyunKwan Univ.
Embedded System Trends - I • Microcomponents grow in importance in IC industry due to their reusability: DSP, m P, m C • More embedded systems will require ASICs • From 20- 70% in 1992 to 60- 70% in 1996 Moral of the story: u-P are joining with high- speed highly-complex ASIC in embedded systems SungKyunKwan Univ.
Embedded System Trends - II • Embedded systems will require more application software • Average moves from 16- 64k lines in 1992 to 64k-512k in 1996 • Requires migration from assembler to C/ C++, implying requirement for automatic compilation • From 40- 70% of programmers versus ASIC designers in 1992 to 60- 90% in 1996 Moral of the story: Increase in code- size / code- complexity is causing a migration to C/ C++ from assembly coding SungKyunKwan Univ.
Embedded Software Optimization • Code size becomes an important objective Software will eventually become a part of the chip: • Need to generate the best possible code • Can afford longer compilation time • Need not only traditional optimization techniques, but also new application- domain-specific optimizations (e. g., DSP and microcontroller architectures) SungKyunKwan Univ.
Implementing Digital Systems SungKyunKwan Univ.
What is an ASIP? • Application- Specific Instruction Processor • Processor architecture tailored not just for application domain (e. g., DSP, microcontrollers), but for specific sets of applications (e. g., audio, engine control) • ASIP characteristics • Greater design cost (processor + compiler) • + Higher performance, lower power than commercial cores, more flexibility than ASIC SungKyunKwan Univ.
ASIP Design • Given a set of applications, determine m architecture of ASIP (i. e., configuration of functional units in datapaths, instruction set) • To accurately evaluate performance of processor on a given application need to compile the application program onto the processor datapath and simulate object code • However, the m architecture of the processor is a design parameter! SungKyunKwan Univ.
Processor Design Flow SungKyunKwan Univ.
Required Compiler Optimizations • Machine independent optimizations • Parallelizing transformations (lots of them!) Common subexpression elimination, Strength reduction, Code motion • Machine dependent optimizations • Loop unrolling and software pipelining • Static allocation (non- recursive procedure calls) • Storage layout (arrays, scalars) • Optimization of mode setting instructions • ® Instruction selection, scheduling, and register allocation SungKyunKwan Univ.
Parallelizing Transformation SungKyunKwan Univ.
Split- Node DAG SungKyunKwan Univ.
Split- Node DAG - 2 • Split- Node DAG represents: • All the legal assignments of basic block nodes to functional units • The data transfers implied by pairs of assignments of basic block nodes connected by an edge • Goal is to find parallelism in the basic block, which can be exploited by the target architecture • Implies grouping operator nodes and data transfer nodes into VLIW instructions • Constraints may disallow certain groupings SungKyunKwan Univ.
Split- Node DAG - 3 SungKyunKwan Univ.
Parallelism Matrix SungKyunKwan Univ.
Common Subexpression Elimination SungKyunKwan Univ.
Constant Propagation and Folding SungKyunKwan Univ.
Dead Code Elimination SungKyunKwan Univ.
Loop Invariant Code Motion SungKyunKwan Univ.
Array Access Strength Reduction SungKyunKwan Univ.
Features of DSP Architectures • DSPs have irregular data- paths • Instruction- set architecture tailored for DSP applications • Limited addressing capability • Autoincrement/ decrement • Bit- reversed addressing • Zero- overhead loops • Some degree of parallelism, e. g., Motorola 56K’s parallel moves. SungKyunKwan Univ.
Example: TMS320C25 DSP SungKyunKwan Univ.
Storage Assignment SungKyunKwan Univ.
Alternative Assignment SungKyunKwan Univ.
Simple Offset Assignment • Assumptions in Simple Offset Assignment (SOA): • Variables reside in memory and are accessed via a single address register • One- to- one mapping of variables to memory locations • A schedule for the basic block is given SungKyunKwan Univ.
Access Sequence SungKyunKwan Univ.
Access Graph SungKyunKwan Univ.
Assignment and Access Graph SungKyunKwan Univ.
Maximum Weighted Path Covering SungKyunKwan Univ.
Optimal Disjoint Path Cover SungKyunKwan Univ.
Code Generation and Optimization • Focus on automatic retargetability and parameterizable optimization methods • Instruction selection for configurable functional units • Scheduling for multiple functional units • Register bank allocation to minimize data transfers • Detailed register allocation with varying load/ spill costs • Optimization to exploit address generator features • Goal is to be able to generate high- quality code for any target architecture description WHEN THAT HAPPENS, CAD IS GOING TO ESSENTIALLY BECOME SOFTWARE COMPILATION !! SungKyunKwan Univ.
The 100 Million Transistor Question HOW BEST CAN WE USE THEM TO SOLVE OUR COMPUTING PROBLEMS ? SungKyunKwan Univ.
Answer I: Multiprocessor on a chip Requirement: Efficient, parallelizing compiler Problems: Enough parallelism in programs? Does not go fast enough for video applications, for instance. SungKyunKwan Univ.
Answer II: Giant FPGA Requirement: CAD system for FPGAs Problems: May work well for bit- level video computations, but in general FPGAs are inefficient. SungKyunKwan Univ.
Answer III: HW/SW Codesign SungKyunKwan Univ.
Mixing Hardware and Software • Argument: Mixed hardware/ software systems represent the best of both worlds. High performance, flexibility, design reuse, etc. • Counterpoint: From a design standpoint, it is the worst of both worlds • Problems of verification, and test become harder • Too many tools, too many interactions, too much heterogeneity • Hardware/ software partitioning is “AI- complete”! SungKyunKwan Univ.
Hardware/Software Co-design I.K.Hwang. San Kim, J.D.Cho • Co-design 이란… • Hardware와 software가 복합된 시스템을 체계적이며 효율적으로 설계하기 위해서 제안. • Hardware 구현 : 비용 증가, 수행시간 빠름. • Software 구현 : 비용 감소, 수행시간 늦음. • Co-design시 고려 사항. • Hardware/Software partitioning • Hardware/Software interface • Co-simulation SungKyunKwan Univ.
Partitioning • Performance Requirements • 몇몇의 Function들은 Hardware로의 구현이 더 용이 • 반복적으로 사용되는 Block • Parallel하게 구성되어 있는 Block • Modifiability • Software로 구성된 Block은 변형이 용이 SungKyunKwan Univ.
Continued • Implementation Cost • Hardware로 구성된 Block은 공유해서 사용이 가능 • Scheduling • 각각 HW와 SW로 분리된 Block들을 정해진 constraints들에 맞출 수 있도록 scheduling • SW Operation은 순차적으로 scheduling되어야 한다 • Data와 Control의 의존성만 없다면 SW와 HW는 Concurrent하게 scheduling SungKyunKwan Univ.
Interface • Interface Block의 필요성 • Hardware와 Software Block간의 Data 전달 • 효율적인 Interface Block 을 구성해야만 HW/SW Block간의 Overhead를 줄일 수 있다 • Interface 방법 • Shared Memory • FIFO • Handshaking protocol SungKyunKwan Univ.
Logical Bus Architecture • System Bus Signals • address, data, control signals • address space consists of the memory space & I/O space • memory space : memory of the SW component • I/O space : ports within SW & registers in other HW • Port Signals • These are specialized signals capable of directly interfacing between SW & HW component • Interrupt Signals • When SW & HW components have completed an operation, or when an error condition is detected SungKyunKwan Univ.
Co-simulation • Co-simulation의 필요성 • HW part와 SW part를 함께 Simulation을 할 수 있게 해 줌으로써 구성된 System의 결과를 예측할 수 있다 • System Performance를 예측하여 Synthesis 이전에 지정된 Spec.에 맞도록 System을 재설계할 수 있도록 해 준다 • HW/SW Partitioning을 위한 각 Sub-block의 특성을 예측해 준다 • Co-simulation Tool • Ptolemy • COSSAP • POLIS SungKyunKwan Univ.
Cossap • SW, HW Co-simulation Tool • SW : C-code(Generic C) • HW : VHDL • Data Flow 형태의 Simulation • block diagram형태로 System 구현 • Simulation Report • 출력을 Waveform 형태로 표시 • System의 Speed를 예측 SungKyunKwan Univ.