170 likes | 263 Vues
Delve into MicroBlaze processor's benefits, multicore hardware experiences, software porting lessons, and challenges faced at RAMP Summer Retreat June 2006. Learn about initial Internet in a Box project, cluster tests, and struggles with IP core quality in multicore designs. Discover difficulties scaling multicore systems on FPGAs, critical timing issues, and architecture limitations affecting software compatibility. Gain insights into the abuse of BRAM, design goals, and application porting challenges. Explore the complexities of running Java on MicroBlaze and the intricacies of adapting software for different MIPS environments.
E N D
Experiencing MicroBlaze Hardware and Software *RAMP Summer Retreat, June 2006
Outline • Introduction and Background • Advantages of MicroBlaze • Hardware Experiences with Multicore • Lesson Learned from Software Porting • Q&A
Initial prototyping of Internet in a Box project IIAB – a RAMP cluster based distributed system testbed at O(1000) nodes MicroBlaze as the first processor and basic building block What have we done in Internet in a Box version 0? A small cluster with Xilinx XUP boards Virtex-II Pro XC2VP30-7, half size and same speed grade as XCVP70 on BEE2 4 MicroBlazes @ 100 MHz per FPGA with heavy workload! (stable) More detail and demo tomorrow Experiences with MicroBlaze Introduction and Background
MicroBlaze Advantage • Easy to use with EDK • Linux/GCC support (with limitations) • High “performance” softcore processor • Most of instructions can be completed with 1 cycles • Shorter pipeline, higher working frequency (>100 MHz on Virtex-II ) • LEON3 7-stage pipeline, 5000 LUTs @ 90MHz on Virtex-II • FPGA optimized implementation • Fast carry chain MUX, hardware multiplier • RLOC placement constraints
Outline • Introduction and Background • Advantages of MicroBlaze • Hardware Experiences with Multicore • Lesson Learned from Software Porting • Q&A
Poor quality of IP cores kills most of developing time.What else can I say here? • Most of IP cores are not multicore compatible (e.g. bus arbitration problem) • A long bug list: opb_ddr, mch_opb_ddr, opb_ethernet • Poorly written document make things worse • Open source/commercial IP core bugs VS open source software bugs • More time to find the problem • More difficult to fix (less update, small size of the community)
Scaling difficulty inside large FPGA (Tussle with softcore) • Timing issue becomes the second time killer! • 100 MHz is the upper bound? • Take quite a while for 4-core design working • 6-core design appears unstable under heavy load • 16-core/FPGA on BEE2 might be ambitious! • Shared resources (e.g. memory controller) become the critical components • Routing delay dominates ( 60%-70%) • Floorplaning highly connected components is hard • Too many fast carry chain style MUXes in IP cores • One-level logic, so faster? – No! without RLOC constraints will make things even worse! • Be careful with your signal naming in RTL codes! • OPB0, OPB1 will be treated as signals from the same bus – affect the register mapping • Place and Route time is so long! O(hour)
Timing summary • When a new IP core is added, it’s not only a resource and functionality problem. • Embedded physical information into RTL code is preferred • Can’t write the code without timing/placement constraints (argument to other high level synthesis tools) • Advanced physical synthesis is preferred • EDK is easy to use, but not friendly with physical synthesis software (e.g. Synplify Premier, Precision Physical). • Tool compatibility issues • RTL information is hided by non-standard netlist files (Xilinx NGC files) • Can’t cross probe between RTL code and mapped design. • For QoR and full control, EDK is not the best choice • Efforts spent on timing tuning exceed those on connecting signals saved by EDK • What about RDL? • An easy solution: • Lower the frequency!
Architecture limitations of MicroBlaze • No MMU support • no protection among processes • Can’t run full version Linux. • No double precision floating point • no full floating point libraries support in libC • No atomic instructions • hard to implement lock • non-blocking FIFO instruction problem. • No cache coherent support
The abuse of BRAM • Most of BRAM are used for Cache • Why not use external SRAM? • high power consumption • high chip cost • unbearable place/route time
Outline • Introduction and Background • Advantages of MicroBlaze • Hardware Experiences with Multicore • Lesson Learned from Software Porting • Q&A
The Missing 5%.. • No protection between processes • Nightmare for software debugging • Lack of fork() • vfork() does not have the same semantics • pthread sometimes works at the cost of rewriting the application. • No shared library support • Applications suffer from jumbo file size • ”simple” i3 applications – 25KB v.s. 1.8MB • some applications will not run without shared library • Ruby interpreter (libdl)
“Auto”config • Makefile/Configuration files can not recognize MicroBlaze target • Too many architecture dependent codes in existing applications • Running Java is hard • Reconfigurable hardware confuses common build tools • Some exception handlers are crucial (e.g. unaligned access)
50MIPS vs 1000MIPS • Many applications are designed to run on CPU over 1000 MIPS • Porting them is not straight-forward • Talk to the real world • I3 pings all fingers every “second” • How to dilute the “second”? • Many places to change in the code • Not done in this project • Time dilation is our future work • Emulate machines over 1000 MIPS
None technical challenges • Maturity level of tools • The software community for MicroBlaze is too small • Research codes are even worse than general software • Portability • Convention