1 / 15

ECOMP - an Erlang Processor

ECOMP - an Erlang Processor. Robert Tjärnström, Ericsson Radio Peter Lundell, Ericsson Telecom. Outline. Why an Erlang Processor The Architecture Run-Time System Prototype Results. What Is an Erlang Processor. A (micro) processor dedicated for execution of Erlang.

mgann
Télécharger la présentation

ECOMP - an Erlang Processor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECOMP - an Erlang Processor Robert Tjärnström, Ericsson Radio Peter Lundell, Ericsson Telecom

  2. Outline • Why an Erlang Processor • The Architecture • Run-Time System • Prototype • Results

  3. What Is an Erlang Processor • A (micro) processor dedicated for execution of Erlang. • Executes compiled Erlang code.

  4. Why a Dedicated Erlang Processor • Increased use of Erlang • Eliminating Performance and Power Dissipations Concerns • Low Power Important in Embedded Control • Simplify use of Erlang for Embedded Control • Eliminate cost for Real-Time Operating System • Provide run-time functionality

  5. Power Dissipation in Processors • Factors Increasing Power Dissipation • Increasing functionality • Less efficient code • Less efficient languages • Increasing speed requirements • Factors Decreasing Power Dissipation • Lower supply voltages • Scaled down mfg. processes • Increased level of integration

  6. Instruction Set Architecture • Optimized for Execution of Erlang Code • Function calls, return from function • Argument transfer • list operations • Register file management • Clean register file upon start of new function • No read/write-back of variables needed

  7. Tag Tag Tag Value Value Value Instruction Set Architecture • Supports processes • Supports local scope • Three sub-instructions in each machine instruction • Sub-instructions for garbage collections

  8. Fetch Decode Reg-File Execute Data Program Mem unit Processor Architecture • Much in common with conven-tional architectures • RISC • LIW • Harvard • Pipelined (3-5 stages) • No complex (advanced) features • Not super-scalar • No OOO-execution or speculative execution • No branch-prediction (but will be added)

  9. Processor Architecture • Real-time garbage collection • GC performed concurrently in HW • Currently supports one element size • HW supported process-switching (~20 cycles) • Currently 1 process-queue, (may have more) • Clock-cycle limit for each process • (Basic type checking) • (Prepared for Multi-Threading)

  10. Run-Time Functionality • Switch, Spawn, Send, Message-queue handling, Catch/Throw, Time-out • External io, Atom-handling, Registered processes • Implemented in machine code • Built-ins (e.g., element) • Standard Libraries (e.g. lists, ETS)

  11. Prototyping • HW model of the processor (developed in Erlang) • VHDL implementation & test bench • FPGA based demonstrator (VHDL-code)

  12. Prototyping II • PCI Board with Xilinx 40150 FPGA and 4 banks of 2 MB SRAM each • Board has PCI bridge (slows down communication)

  13. Prototyping III • Using a PC (NT) to host the board. • Board driver routines only available for Win NT • Messaging between Erlang-host to Erlang-board is accomplished thru a dynamically loadable driver (DLL). • 7 us / message on average • The external Erlang format is used for comm between board and host. • IO processes are running on both board and host.

  14. Performance • About 3-4 lines of machine code per Erlang line • An approximate speed-up of a factor 30 can be seen • measured per use of clock cycles • Tested a larger example • Call Control. 16 KLOC. (714 k dump) • Increasing performance while decreasing power with more than order of a magnitude

  15. Near Future Activities • Compiler Improvements • Product integration's • Distributed control node, e.g., multi-processor execution. • Full-Scale Version. • Multi-threaded Processor. • Prepare for Silicon Implementation.

More Related