1 / 103

Altera vs. Xilinx

Ognjen Šć eki ć. prof. dr Veljko Milutinovi ć. ogi@cg.yu. vm @ etf . bg.ac. yu. Altera vs. Xilinx. Introduction. FPGA vs. ASIC. FPGA = F ield P rogrammable G ate A rray flexibility of software + speed of hardware. ASIC = A pplication S pecific I ntegrated C ircuits

jaden
Télécharger la présentation

Altera vs. Xilinx

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ognjen Šćekić prof. dr Veljko Milutinović ogi@cg.yu vm@etf.bg.ac.yu Altera vs. Xilinx Ognjen Šćekić

  2. Introduction Ognjen Šćekić

  3. FPGA vs. ASIC FPGA= Field Programmable Gate Array flexibility of software + speed of hardware ASIC= Application Specific Integrated Circuits tailor-made on demand for specific applications Ognjen Šćekić

  4. Key players: Xilinx, Altera, Lattice, Actel PLD market estimated at $57 billion and rapidly growing The goal is to expand the market: by lowering per-unit cost to attack the low-end market by increasing speed capabilities to attack the high-end market Market Overview Figure 1 - PLD market share Ognjen Šćekić

  5. Pronounced "zylinks" Founded in 1984 Employs around 2,600 people. Claims more than half the world demand for FPGAs. Partners with leading semiconductor manufacturers such as IBM Microelectronics, UMC and Seiko. Xilinx is the net market leader at the moment About Xilinx Ognjen Šćekić

  6. Founded in 1983. Introduced look-up table based architecture in 1992 Second greatest FPGA manufacturer Strategic partner is TSMC About Altera Ognjen Šćekić

  7. Recent FPGA Design Timeline • Virtex and Stratix families are direct opponents, as are Spartan and Cyclone Ognjen Šćekić

  8. • Fabrication process • Logic density • Clock management • On-chip memory • DSP capabilities • I/O compatibility • Software support & other design services Key Factors For ComparingFPGAs Ognjen Šćekić

  9. More advanced fabrication process brings higher integration and thus higher density and/or reduced size of chip. Currently the most advanced is 90nm process (previously 0.13μm) Fabrication Process • first used in Spartan-3, and later in Virtex-4 FPGA family • gave Xilinx one year lead over Altera • Altera introduced it in 2004 with Cyclone II and Stratix II Figure 2 - Cyclone II 90nm structure Ognjen Šćekić

  10. We need a unit to express the logic capability of FPGA Is it possible to define such unit precisely? Traditionally: Xilinx: LC – Logic Cell Altera: LE – Logic Element Logic Density 1 LC = 4-input LUT + D-FF + arithmetic/logic/register circuitry 1 LC = 1 LE Ognjen Šćekić

  11. Improved functionality of "new" architectures introduced new terms: ALM – Adaptive Logic Module for describing Altera's Stratix II family's adaptable structure CLB – Configurable Logic Block for describing Xilinx's FPGA families ELC – Equivalent Logic Cell Xilinx's new unit to better express logic density Logic Density (2) 1 ELC = 1.125 LC 1 CLB has 8 LCs Ognjen Šćekić

  12. Clock management comprises two basic functions: • remove clock skew and propagation delay • generate new clock signals with different frequencies and/or phases Clock Management • All parts of a digital circuit need to be synchronized to a desired clock signal. • If a circuit is large, complex, and operating at high frequencies • the clock propagation delay and clock skew have a great impact on performance. • Therefore, providing a clock signal with zero-delay in all parts of an FPGA • becomes crucial. • The solution is to divide FPGA into regions that can work at different frequencies, • called clock domains. Ognjen Šćekić

  13. It can be done using: DLLs – Delay-Locked Loops (Xilinx) PLLs – Phase-Locked Loops (Altera) Removing Clock Skew Figure 3a - DLL block diagram Figure 3b - PLL block diagram They both compensate for the delay generated on the routing network inside the FPGA, providing zero-delay clock signal to different parts of FPGA. Ognjen Šćekić

  14. Delay-Locked Loop DLL works by inserting delay between the input clock and the feedback clock until the two rising edges align, putting the two clocks in phase. When the two clocks are in phase, the DLL "locks". Thus, the DLL output clock compensates for the delay in the clock distribution network. • Delay-line produces a delayed version of the input clock CLKIN. • Clock distribution network routes the clock to FPGA interior • and to the feedback CLKFB pin. • Control logic sample the input clock and the feedback clock • in order to adjust the delay line. • Delay-line consists on an array of delay elements, • typically CMOS voltage-controlled inverters connected in series. Ognjen Šćekić

  15. Phase-Locked Loop • Instead of a delay line, the PLL uses a voltage controlled oscillator • which generates a clock signal that approximates the input clock CLKIN. • Control logic, consisting of a phase detector and filter, • adjusts the oscillator frequency and phase to compensate for the clock distribution delay. • When the clocks are aligned the PLL "locks". Ognjen Šćekić

  16. PLL DLL PLL vs. DLL Drawback: oscillator accumulates phase error Advantage: does not accumulate phase error Advantage: frequency synthesis is easier because of oscillator Drawback: frequency synthesis is more difficult Altera uses PLLs and Xilinx uses DLLs. Ognjen Šćekić

  17. Beside clock skew elimination, DLLs (PLLs) are also used for: Clock managers need to be resistant to temperature/voltage variations. Clock Generation & Phase Shifting • frequency multiplication and division • duty-cycle regulation • phase shifting Clock manipulation dramatically simplifies the design and improves performance. At the same time it provides many design alternatives. Ognjen Šćekić

  18. Using LUTs as registers does not provide enough space or versatility. Time-dependent applications, performing many computations, need an entire built-in memory. The main advantages of embedded (built-in) memory are: Embedded Memory • short access time • high bandwidth • great versatility • It can behave like: • RAM • ROM • Buffer (FIFO, LIFO, etc.) • Cache • Shift registers • etc… Ognjen Šćekić

  19. DSP– Digital Signal Processing DSP Capabilities • Majority of FPGA applications require some sort of DSP. • In order to increase efficiency DSP computations are executed in parallel - pipelining. • Special DSP units have been developed to fully exploit FPGA's adaptable structure. • These units are designed to optimize execution of commonly used DSP algorithms: • filtering, encoding/decoding, equalization, modulation, FFT, etc • They usually contain: • multipliers (in parallel), accumulators, adders and shift registers Ognjen Šćekić

  20. As FPGAs continue to grow in size and capacity more complex systems are designed for them,demanding an increased variety of I/O standards . Furthermore, as system-clock speeds continue to increase, the need for high-performance I/O becomes more important. Modern bus applications, pioneered by the most influential companies, are commonly introduced with a new I/O standard, tailored specifically to the needs of that application. I/O Compatibility The bus I/O standards provide specifications to other vendors who create products designed to interface with these applications. Each standard often has its own specifications for: current, voltage, I/O buffering and termination techniques. Ognjen Šćekić

  21. Interfaces are implemented in I/O blocks. I/O blocks are parts of FPGA architecture positioned peripherally, connected to I/O pins and to internal interconnects. I/O blocks are grouped into banks – a group of neighboring pins which use the same or compatible I/O standard at the same time. I/O Compatibility (2) Ognjen Šćekić

  22. An I/O block usually contains: I/O Compatibility (3) • programmable I/O buffers • Programmable so they could adjust to different I/O standards. • D-FFs • Used as optional delay elements or registers. • pull-up/down resistors • Used to assert or de-assert pins that would otherwise float. • delay array • Provides a programmable delay of I/O signals. • keeper circuit • Keeps the last state on a bus if all other drivers are in High-Z state. Ognjen Šćekić

  23. Development of an FPGA-based hardware system can be divided into following stages: system design & synthesis design implementation on-chip verification Software Support Figure 4a - Altera design flow diagram Figure 4b - Xilinx design flow diagram Ognjen Šćekić

  24. Begins with the design entry phase using: HDL – Hardware Description Language (like VHDL or Verilog) schematic editor Software solutions offer complete integrated environments for this stage. A wide variety of FPGA-ready component libraries are available ranging from simple processors, peripheral components, controllers, down to general logic (gates, counters, decoders, etc). Software support hierarchical design entry. System Design Stage Ognjen Šćekić

  25. Once the hardware design is complete it is synthesized: A process that transforms it from HDL form into a low-level gate form, called RTL – Register Transfer Level description. The system design stage is platform independent. The resulting RTL description of our system can be fitted into any FPGA. System Design Stage (2) Figure 5 - HDL and schematic representation of a BCD counter Ognjen Šćekić

  26. Commonly called Place-And-Route stage. Place-And-Route tools take the input RTL netlist for the design and map the logic into the architectural resources of the FPGA. Then, the best location for these blocks is found, based on their interconnections and desired performance. Finally, the interconnects are routed, and pins assigned. Design Implementation Stage Ognjen Šćekić

  27. This stage is platform-dependent, since our design is implemented in an actual FPGA architecture. Therefore, place-and-route tools are developed by the FPGA vendors. They are developed to take full advantage of FPGA architecture, and to provide optimum performance for a given design. Many analysis and simulation tools are provided for this stage. Design Implementation Stage (2) The result of this stage is a configuration file which is loaded into FPGA at startup Ognjen Šćekić

  28. This stage is executed once the design has been loaded into the FPGA. It gives the developer the possibility for real-world debugging. Special cables are supplied with FPGA development kits, for connecting FPGAs to a PC or a workstation. This provides means for reading contents of internal registersand memory. On-Chip Verification Stage Ognjen Šćekić

  29. Software Support (2) • Both Xilinx and Altera offer complete software development kits that guide users through all 3 stages of system design. • Altera offers Quartus II • Xilinx offers ISE • Third-party software tools can be used in system design stage as well. Ognjen Šćekić

  30. Complete designs of some complex systems, written in HDL by FPGA manufacturers, optimized to run on their FPGAs. e.g. microcontrollers, microprocessors, etc. CPUs: Altera: 32-bitNios II Xilinx: 32-bitMicroBlaze "Intellectual Property" Blocks Figure 6 - Block diagram of Altera's 16-bit Nios processor Ognjen Šćekić

  31. When FPGA based designs move in volume production the main issue is cost reduction! Xilinx and Altera have different approaches: Volume Production Solutions Xilinx offers specialized EasyPath FPGAs: Once the clients have developed their system on FPGA, they send it to Xilinx. After 8 weeks they get back the optimized FPGAs with exactly the same functionality. These optimized FPGAs are 30%-80% less expensive when mass produced, and they represent replacements for structured ASICs, and take less time to be completed. Altera offers a service called HardCopy : It is a migration path from the FPGA to structured ASIC. Altera developed a fine-grained cell structure (HCells) ASICs which perfectly match the logic elements (LEs) of Altera’s FPGAs. That way Stratix LEs are mapped to equivalent logic elements in the corresponding HardCopy device. If a Stratix LE is not used in the FPGA design, then it is not mapped to the HardCopy device, yielding a more efficient mapping of the prototyped design. Ognjen Šćekić

  32. Overviews & Comparisons Ognjen Šćekić

  33. low-end FPGA family Ognjen Šćekić

  34. Overview • Most recent Altera's low-end FPGA family • Introduced in 2004, first shipped in February 2005 • 1.2V core, 90nm process Ognjen Šćekić

  35. Packaging • Commercial grade and industrial grade devices are offered. Ognjen Šćekić

  36. Functional Description • Two-dimensional row/column-based architecture to implement custom logic. • Column and row interconnects of varying speeds provide signal interconnects between Logic Array Blocks (LABs), embedded memory, and multipliers. • Logic array consists of LABs, with 16logic elements (LEs) in each LAB. Ognjen Šćekić

  37. Functional Description (2) • Density from 4,608 to 68,416 LEs. • Up to four phase-locked-loops (PLLs). • Global clock network consists of up to 16 global clock lines that drive throughout the entire device. Ognjen Šćekić

  38. Functional Description (3) • M4K memory blocks are true dual-port memory blocks with 4K bits of memory. • Works at up to 260 MHz. • These blocks are arranged in columns across the device in between certain LABs. • Cyclone II devices offer between 119 to 1,152 Kbits of embedded memory. Ognjen Šćekić

  39. Functional Description (4) • Each embedded multiplier block can implement either two 9×9-bit multipliers, or one 18 × 18-bit multiplier. • Embedded multipliers are arranged in columns across the device. • Up to 250-MHz performance. Ognjen Šćekić

  40. Functional Description (5) • Each I/O pin is fed by an IOE (Input Output Element) located at the periphery of the device. • I/O pins support various single-ended and differential I/O standards. • Each IOE contains a bidirectional I/O buffer and three registersfor registering input, output, and output-enable signals. Ognjen Šćekić

  41. Programmable register. Can be configured like D, T, JK or SR flipflop. Used optionally. LE Unit 4-input LUT acts as a function generator for logic functions with 4 variables, or a 16-bit register. Carry logic • Cyclone II LE can operate in 2 modes: • normal mode • arithmetic mode Ognjen Šćekić

  42. LE – Normal Mode • Suitable for general logic applications and combinatorial functions. Ognjen Šćekić

  43. LE – Arithmetic Mode • Implements a 2-bit full adder and basic carry chain Ognjen Šćekić

  44. Column Interconnect.Connects multiple LABs LABs and Interconnects Logic Array Blockconsists of 16 LEs connected with carry and register chains • LAB - Logic Array Block Local Interconnect.Transfers signals between LEs in the same LAB Row Interconnect.Connects multiple LABs Ognjen Šćekić

  45. Clock Management • Clock network features: • Up to 16 Global Clock Networks • Up to 4 PLLs • Dynamic clock source selection, enable and disable • Global clock networks spread throughout the entire device. • They provide clocks for all resources within the device, such as IOEs, LEs, memory blocks, and embedded multipliers. • They are driven by external clock sources (via clock pins),PLL outputs or the logic array signals. • Global clock lines can also be used for general purpose control signals. Ognjen Šćekić

  46. Clock Management (2) • There is one clock control block for each global clock network. • They are arranged on the device periphery. • Clock control blocks are used to select/enable/disable a global clock network. • Multiplexers are used with these clocks to form 6-bit buses to feed LABs and IOEs. Ognjen Šćekić

  47. Clock Management (3) • PLLs are located at the corners: Ognjen Šćekić

  48. Clock Management (4) • Cyclone II PLLs provide: • Clock skew elimination Provides zero-delay clock signal in every part of FPGA. • Clock multiplication and division Ranges from x(1/128) up to x32. • Phase shifting Programmable phase shifts in increments of at least 45°. • Programmable duty-cycle Generate clock outputs with a variable duty cycle • Manual clock switchover Enables you to switch between two reference input clocks for applications that may require support for clocks with two different frequencies. Ognjen Šćekić

  49. Embedded Memory • Consists of columns of M4K memory blocks: Ognjen Šćekić

  50. Embedded Memory (2) • The M4K blocks support the following features: • 4,608 RAM bits (4Kbits + parity bits – one for each byte) • 250-MHz performance • True dual-port memory Supports any combination of two-port operations: 2 reads, 2 writes, or 1 read and 1 write at different clock frequencies. • Simple dual-port memory Simultaneous reads and writes are supported. • Single-port memory Simultaneous reads and writes are not allowed. • Shift register Ognjen Šćekić

More Related