1 / 110

ECLIPSE Extended CPU Local Irregular Processing Structure

ECLIPSE Extended CPU Local Irregular Processing Structure. IST E. van Utteren. DS & PC A. van Gorkum. IC Design G. Beenker. LEP, HVE T. Doyle. IPA W.J. Lippmann. ESAS A. van der Werf. IT E. Dijkstra. ViPs G. Depovere. DD&T C. Niessen. AV & MS Th. Brouste. PROMMPT

osias
Télécharger la présentation

ECLIPSE Extended CPU Local Irregular Processing Structure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECLIPSEExtended CPU Local Irregular Processing Structure IST E. van Utteren DS & PC A. van Gorkum IC Design G. Beenker LEP, HVE T. Doyle IPA W.J. Lippmann ESAS A. van der Werf IT E. Dijkstra ViPs G. Depovere DD&T C. Niessen AV & MS Th. Brouste PROMMPT J.T.J. v. Eijndhoven ECLIPSE CPU Jos.van.Eijndhoven@philips.com CRB 1992-412

  2. DVP: design problem Nexperia media processors ?

  3. DVP: application domain • High volume consumer electronics productsfuture TV, home theatre, set-top box, etc. • Media processing:audio, video, graphics, communication

  4. DVP: SoC platform • Nexperia line of media processors for mid- to high-end consumer media processing systems is based on DVP • DVP provides template for System-on-a-Chip • DVP supports families of evolving products • DVP is part of corporate HVE strategy

  5. DVP: system requirements • High degree of flexibility, extendability and scalability • unknown applications • new standards • new hardware blocks • High level of media processing power • hardware coprocessor support

  6. DVP: architecture philosophy • High degree of flexibility is achieved by supporting media processing in software • High performance is achieved by providing specialized hardware coprocessors • Problem: How to mix & match hardware based and software based media processing?

  7. A C Process B Read FIFO Execute Write DVP: model of computation Model of computation is Kahn Process Networks: • The Kahn model allows ‘plug and play’: • Parallel execution of many tasks • Configures different applications by instantiating and connecting tasks • Maintains functional correctness independent of task scheduling issues • TSSA: API to transform C programs into Kahn models

  8. DVP: model of computation Application - parallel tasks - streams Mapping - static Architecture - programmable graph CPU coproc1 coproc2

  9. DVP: architecture philosophy • Kahn processes (nodes) are mapped onto (co)processors • Communication channels (graph edges) are mapped onto buffers in centralized memory • Scheduling and synchronization (notification & handling of empty or full buffers) is performed by control software • Communication pattern between modules (data flow graph) is freely programmable

  10. DVP: generic architecture • Shared, single address space, memory model • Flexible access • Transparent programming model • Physically centralized random access memory • Flexible buffer allocation • Fits well with stream processing • Single memory-bus for communication • Simple and cost effective

  11. DVP: example architecture instantiation SDRAM Serial I/O video-in PCI bridge video-out timers I2C I/O audio-out I$ VLIW cpu audio-in D$ I$ Imagescaler MIPS cpu D$

  12. DVP: TSSA abstraction layer TSSA-OS TSSA-Appl1 TSSA-Appl2 TM-CPU software Traditional coarse-grain TM co-processors TSSA stream data, buffered in off-chip SDRAM, synchronization with CPU interrupts

  13. DVP: TSSA abstraction layer • Hides implementation details: • graph setup • buffer synchronization • Runs on pSOS (and other RTKs) • Provides standard API • Defines standard data formats

  14. Outline • DVP • Eclipse DVP subsystem • Eclipse architecture • Eclipse application programming • Simulator • Status

  15. Eclipse DVP subsystem Objective Increase flexibility of DVP systems, while maintaining cost-performance. Customer • Semiconductors: Consumer Systems (Transfer to TTI) • Consumer Electronics: Domain 2 (BG-TV Brugge) • Research Products Mid- to high-end DVP / TSSA systems: DTVs and STBs

  16. Eclipse DVP subsystem: design problem SDRAM • Increase application flexibility through re-use of medium-grain function blocks, in HW and SW • Keep streaming data on-chip But ? • More bandwidth visible • Limited memory size • High synchronization rate • CPU unfriendly HDVO condor MPEG CPU DVP/TSSA system: • Coarse-grain ‘solid’ function blocks(reuse, HWSW ?) • Stream data buffered in off-chip memory(bandwidth, power ?)

  17. VO DVDdecode Eclipse CPU MPEG2decode MPEG2 encode Design problem: new DVP subsystem 1394 external memory CPU

  18. Eclipse DVP subsystem: application domain Now, target for 1st instance: • Dual MPEG2 full HD decode (1920 x 1080 @ 60i) • MPEG2 SD transcoding and HD decoding Anticipate: • Range of formats (DV, MJPEG, MPEG4) • 3D-graphics acceleration • Motion-compensated video processing

  19. Application domain: MPEG2 decoding (HD)

  20. Application domain: MPEG2 encoding (SD)

  21. Application domain: MPEG-4 video decoding Reference Pictures Reference Pictures Reference Pictures Reference Pictures Shape Motion Compensation <220 Context Arithmetic Decoding Shape MV Prediction 0.1 90 90 Picture Reconst. MV Decoder Motion Comp. 90 MPEG-4 ES 128 90 Variable Length Decoding Inverse Scan Inverse Quantization IDCT 90 <384 800 90 DC & AC Prediction <7

  22. Sandra Eclipse CPU MPEG-4: system level application partitioning Composition and rendering Scene description Audioobject Videoobject 3D Gfxobject Decompression De-multiplex Network layer

  23. D$ MediaCPU I$ MPEG-4: partitioning Eclipse - SANDRA SDRAM MMI VO(SANDRA) VI VLD SRAM DCT MBS MC Eclipse

  24. Eclipse DVP subsystem: current TSSA style TSSA TSSA-Appl1 TSSA-Appl2 TM-CPU software Traditional coarse-grain TM co-processors TSSA stream data, buffered in off-chip SDRAM, synchronization with CPU interrupts

  25. Eclipse DVP subsystem: Eclipse tasks embedded in TSSA TSSA TSSA task on DVP HW TSSA-Appl1 TSSA-Appl2 TSSA task in SW TSSA task on Eclipse Eclipse task on HW Eclipse task in SW TSSA data streamvia off-chip memory Eclipse data streamvia on-chip memory EclipseDriver

  26. Eclipse DVP subsystem: scale down Hierarchy in the DVP system: • Computational model which fits neatly inside DVP & TSSA Scale down from SoC to subsystem: • Limited internal distances • High data bandwidth and local storage • Fast inter-task synchronization

  27. Outline • DVP • Eclipse DVP subsystem • Eclipse architecture • Model of computation • Generic architecture • Eclipse application programming • Simulator • Status

  28. Eclipse architecture: model of computation Application - parallel tasks - streams Mapping - static Architecture - programmable - medium grain - multitasking CPU coproc1 coproc2

  29. Model of computation: architecture philosophy The Kahn model allows ‘plug and play’: • Parallel execution of many tasks • Application configuration by instantiating and connecting tasks. • Functional correctness independent of task scheduling issues. Eclipse is designed to accomplish this with: • A mixture of HW and SW tasks. • High data rates (GB/s) and medium buffer sizes (KB). • Re-use of co-processors over applications through multi-tasking • Runtime application reconfiguration.

  30. Allow proper balance in HW/SW combination Function-specific engines High Eclipse Energy efficiency DSP-CPU Low Low High Application flexibility of given silicon

  31. Previous Kahn style architectures in PRLE CPA C-Heap Explicit synchronization Shared memory model Mixed HW/SW Data driven HW synchronization Multitasking coprocs Eclipse But ? Dynamic applications CPU in media processing But ? High performance Variable packet sizes

  32. Outline • DVP • Eclipse DVP subsystem • Eclipse architecture • Model of computation • Generic architecture • Coprocessor shell interface • Shell communication interface • Architecture instantiation • Eclipse application programming • Simulator • Status

  33. Generic architecture: inter-processor communication • On-chip, dedicated network for inter-processor communication: • Medium grain functions • High bandwidth (up to several GB/s) • Keep data transport on-chip • Use DVP-bus for off-chip communication only

  34. Communication network Generic architecture: communication network CPU Coprocessor Coprocessor

  35. Generic architecture: memory • Shared, single address space, memory model • Flexible access • Software programming model • Centralized wide memory • Flexible buffer allocation • Fits well with stream processing • Single wide memory-bus for communication • Simple and cost effective

  36. Generic architecture: shared on-chip memory CPU Coprocessor Coprocessor Communication network Memory

  37. Generic architecture: task level interface Partition functionality between application-dependent core and generic support. • Introduce the (co-)processor shell: • Shell is responsible for application configuration, task scheduling, data transport and synchronization • Shell (parameterized) micro-architecture is re-used for each coprocessor instance • Allow future updates of communication network while re-using (co-)processor core design • Implementations in HW or SW

  38. Computation layer Generic support layer Shell-SW Shell-HW Communication network layer Generic architecture: layering CPU Coprocessor Coprocessor Task-level interface Shell-HW Shell-HW Communication interface Communication network Memory

  39. Task level interface: five primitives Multitasking, synchronization, and data transport: • int GetTask( location, blocked, error, &task_info) • bool GetSpace ( port_id, n_bytes) • Read( port_id, offset, n_bytes, &byte_vector) • Write( port_id, offset, n_bytes, &byte_vector) • PutSpace ( port_id, n_bytes) GetSpaceis used for bothget_dataandget_roomcalls. PutSpaceis used forbothput_dataandput_roomcalls. The processor has the initiative, the shell answers.

  40. a: Initial situation of ‘data tape’ with current access point: Task level interface: port IO Task A b: Inquiry action provides window on requested space: n_bytes1 c: Read/Write actions on contents: offset d: Commit action moves access point ahead: n_bytes2

  41. Empty space Granted window for writer A B Granted window for reader Space filled with data Task level interface: communication through streams Kahn model: Task A Task B Implementation with shared circular buffer: The shell takes care that the access windows have no overlap

  42. Task level interface: multicast Task B Forked streams: Task A Task C The task implementations are fixed (HW or SW).Application configuration is a shell responsibility. Empty space Granted window for writer C Granted window for reader C A B Granted window for reader B Space filled with data

  43. Task level interface: characteristics • Linear (fifo) synchronization order is enforced • Random access read/write inside acquired window through offset argument • Shells operate on unformatted sequences of bytesAny semantical interpretation is left to the processor • A task is not aware of where its streams connect to,or other tasks sharing the same processor • The shell maintains the application graph structure • The shell takes care of: fifo size, fifo memory location, wrap-around addressing, caching, cache coherency, bus alignment

  44. Task level interface: multi-tasking • Non-preemptive task scheduling • Coprocessor provides explicit task-switch moments • Task switches separate ‘processing steps’(Granularity: tens or hundreds of clock cycles) • Shell is responsible for task selection and administration • Coprocessor provides feedback to the shell on task progress int GetTask( location, blocked, error, &task_info)

  45. Computation layer Shell-SW Shell-HW Communication network layer Generic architecture: generic support CPU Coprocessor Coprocessor Task-level interface Shell-HW Shell-HW Generic support layer Communication interface Communication network Memory

  46. Generic support: the Shell The shell takes care of: • The application graph structure, supporting run-time reconfiguration • The local memory map and data transport(fifo size, fifo memory location, wrap-around addressing, caching, cache coherency, bus alignment) • Task scheduling and synchronization The distributed implementation: • Allows fast interaction with local coprocessor • Creates a scalable solution

  47. Generic support: synchronization • PutSpace and GetSpace return after local update or inquiry. • Delay in messaging does not affect functional correctness. Coprocessor A Coprocessor B PutSpace( port, n ) GetSpace( port, m ) Shell Shell m  space space – = n space + = n Message: putspace( gsid, n ) Communication network

  48. Generic support: application configuration Coprocessor Shell tables are accessible through a PI-bus interface Shell Stream table Task table addr size space gsid . . . budget . . . info str_id Task_id Stream_id Communication network

  49. Generic support: data transport caching • Translate byte-oriented coprocessor interface to wide and aligned bus transfers. • Separated caches for read and write. • Direct mapped: two adjacent words per port • Coherency is enforced as side-effect of GetSpace and PutSpace • Support automatic prefetching and preflushing

  50. Generic support: cache coherency

More Related