280 likes | 294 Vues
This article discusses the different specialty memories used in graphics accelerators, such as SGRAM, VRAM, and WRAM, and their impact on density, performance, and cost. It also explores the SAM and fill operation support in these memories.
E N D
Graphics Hardware:Specialty Memories,Simple Framebuffers Bob Reese - ECE, MSU EE 4993 - Computer Graphics Hardware Spring 98
Introduction • Prof Moorhead has previous given overviews of some graphic accelerator architectures • Simple FrameBuffers • 2D & 3D accelerator architectures • We will review some of this material and concentrate on memories built especially for use with graphics accelerators.
Memory Issues: Density • Lots of Pixels => Lots of Memory • 1280 x 1024 x 32 bits/pixel => 5.1Mb • Double Buffering => Double Memory • Two frames of above => 10.2Mb • Storage needed for things other than pixels, e.g. textures • Voodoo2 card has 12Mb (high-end gaming) • Intergraph Realizm 3D has up to 32 Mb • 100 bits per pixel with 2.5 Million pixels (1824 x 1368)
Memory Issues: Performance • Performance in this instance means BANDWIDTH • How fast can I get data in/out? • Bandwidth affected by bus width, bus rate, and contention for memory • At least two contenders for Graphics memory CPU Write Mostly, Random Access Video Controller Read Only, Sequential Access Memory
Memory Issues:Cost • Cost primarily an issue for low to mid range boards intended for consumers • Would like to use standard DRAM for memory because: • offers highest density at lowest cost • High volume of standard DRAM also means low cost • BUT…. will performance will suffer because it only has one port? • Newer single port Synchronous DRAMs have enough performance for both 2D & 3D apps
Single Port FrameBuffer CPU • Control arbitrates between video and CPU access • Simplest scheme only allows CPU access during horizontal or vertical retrace Monitor Memory MUX RAMDAC Control
Double Buffering Can Help CPU MemoryFB #1 • When CPU is preparing next frame, video is accessing current frame Monitor MUX RAMDAC MemoryFB #2 Control
A 2D Accelerator with Single Port DRAM DRAM (single ported) CPU
Specialty Memories for Graphics • SGRAM (Synchronous Graphics RAM) • Single Ported Synchronous DRAM with support for fill operations, fast operation • VRAM (Video RAM) • Dual Ported DRAM - parallel port for random access and serial output port (locations accessed sequentially • WRAM (Windowed RAM) • VRAM with better fill operation support • 3D-RAM • ASM (Application Specific Memory) with support for rasterization portion of OpenGL pipeline
VRAM Details • SAM - Serial Access Memory • arranged as 256 x 16 registers • connected to 16 serial outputs (SDQ0 - SDQ15) • registers read in sequential order • Entire SAM can be loaded from DRAM array in one memory cycle • Supports a fill operation in which 8 columns be written with data from a color register • high, low bytes can be seperately masked in each column
Bit 2 pixel 0 pixel 511 Bit 15 Bit 14 Bit 1 Mapping Pixels to Memory Simplest case: assume a 512 x 512 screen with 16bpp Memory Location = Pixel # …. ….. Each memory location has one pixel. Memory locations are in scanline order. 0 1 2 ….. 511 512 513 514 ….. 1023 Each plane 512 x 512 bits Numbers are Mem locations 1st plane is Bit 0 of each pixel
Bit 7 Bit 1 pixel 0 pixel 1 pixel 1023 pixel 1022 Bit 15 Bit 14 Bit 8 Multiple pixels per word If we go to 8bpp, then we can double the number of pixels. Increase 50% in both X, Y => 768 x 768 (512+256 = 768) Now each memory location contains two pixels. 0 1 2 ….. 511 512 513 514 ….. 1023
Loading the SAM Use a 512 x 512 screen with 16bpp Two 128 bit segments of selected Row is loaded into SAM Columns Row 0 0 127 128 255 256 383 384 511 0 SAM 256 x 16 When SAM read sequentially, pixels read in scanline order 127 128 255
Bit 7 Bit 1 pixel 0 pixel 1 pixel 1023 pixel 1022 Bit 15 Bit 14 Bit 8 Fill Operation Support Block Fills common operation in 2D graphics 8 Columns from selected row can be filled with value from color register (16 bit) in one operation Colored locations show fill operation. Starting column address has lower 3 bits = 0. (512x512)/8 => 32768 ops to fill entire array. 0 1 2 …..7 .. 511 512 513 514 ….. 1023
Fill can also do Stenciling Can mask individual bits and entire columns. (only 8 bits shown here for each location)
Getting Pixels on the Screen • Need to hook up RAMDAC to convert Pixels to RGB. • RAM (palette Table) + Digital to Analog Converter • Will look at a Brooktree RAMDAC as an example • 8 bit pixel input used to address 256 x 24 lookup table • 15, 16, or 24/32 BPP true color supported (lookup table bypassed). • Dual edge clocking supported to reduce number of load clocks for true color pixels
DAC Bit assignments The above is used when in true color mode.
VRAM to DAC VRAM R • Assume no dual edge clocking • Define Pixel Rate as RBG update rate • If 16BPP, SAM clk = pixel rate; RAMDAC clk = 2 *pixel rate • If 8BPP, SAM clk = 1/2 pixel rate; RAMDAC clk = pixel rate • If 32BPP, SAM clk = 2 * pixel rate • RAMDAC clk = 4 * pixel rate SAM 16 MUX 8 RAMDAC B G
Two Frame Buffers VRAM SAM 16 CE R MUX 8 RAMDAC B VRAM G 16 SAM CE Frame Select Bit
IBM SGRAM 8 Mb (256K x 32)
IBM SGRAM Features • A Synchronous DRAM with a few extra features for graphics • Fill operation support as in VRAM • Two color registers • Pipelined architecture that allows Column address to change every cycle • Precharging one bank while accessing other bank allows continuous access • Implies that rows would accessed alternatively between banks • 83 Mhz, 100 Mhz, 133 Mhz clock speeds
Sample Configuration • Assume 83 Mhz bandwidth • 1024 x 1280 x 32 Display • Each SGRAM can hold 256K Pixels • 1280/256 => 5 SGRAM chips • Assume no double buffering • Assume Refresh Rate = 72 Hz (13.9 ms) • Each chip accessed 13.9/5 = 2.8ms (1/5 or 20% of my bandwidth is used for screen refresh). • 80% * 83 Mhz => 66 Mhz bandwidth!!! • Previous generation DRAMs had only 20 Mhz bandwidth
More Numbers • What can be done with 66 Mhz bandwidth? • What can be done in 4 * 2.8ms => 11.2 ms? • Assume I want to read each pixel, modify it, write it in one screen refresh time. • If I assume no pipelining of operations, but this operation is done by attached accelerator, then will take 3 clocks (1 clk read, 1 clk op, 1 clk write) • 83 Mhz => 12 ns • 256K pixels * 12 ns * 3 => 9.5 ms • Plenty of time!!!! Still have 1.7 ms left over! • Double buffering would help, then I would have: • 10 SGRAM chips, 9 * 2.8 ms => 25.2 ms
Video SDRAM1 PixelOp1 Parallel Pixel Ops • The previous example assumed parallel pixel ops to each SDRAM device Time 1 - PixelOp blocked to #1 SDRAM2 SDRAM3 SDRAM4 SDRAM5 PixelOp2 PixelOp3 PixelOp4 PixelOp5 Time 2 - PixelOp blocked to #2 Video SDRAM1 SDRAM2 SDRAM3 SDRAM4 SDRAM5 PixelOp1 PixelOp2 PixelOp3 PixelOp4 PixelOp5
Non-Parallel Pixel Ops • If non-parallel ops, then only have 2.8 ms! • obvious advantage of parellel pixel ops Time 1 - PixelOp, Video access different SDRAMs Video SDRAM1 SDRAM2 SDRAM3 SDRAM4 SDRAM5 SRAMs 3,4,5 are idle PixelOp Time 2 - PixelOp, Video access move to next SDRAM Video SDRAM1 SDRAM2 SDRAM3 SDRAM4 SDRAM5 SRAMs 1,4,5 are idle PixelOp