Radeon Graphics Architecture

Radeon GraphicsArchitecture By Paul Zimmons 10/26/2000

Organization • Chip Specs • Memory • TnL Pipeline • Shading Pipeline • Other Features

Overview: What is a Radeon? • Chip introduced by ATI in April (announced) but more like August • Designed by ATI • As opposed to ArtX (next gen/Dolphin) • Compete with nVidia • Pathway to DX8

Radeon Chip Specifications • 30 M Transistor, 100 mm2 • 256-bit = 2 x 128-bit units • 0.18 micron fabrication • Handles up to 128 Megs memory • 200 Mhz core and 200 Mhz memory • DDR makes that more like 400 Mhz mem • 2 rendering pipelines (@ 200 Mhz) • 8 Hardware lights • 350 Mhz DAC

More Chip Specifications • Dual Processor Enabled • 3 textures per clock • 6.4 GB/s memory bandwidth • But how much of this is used? • Programmable TCL (Charisma Engine) • Programmable Pixel Shading (Pixel Tapestry) • Z compression (Hyper Z) • Optimized for 32 bpp rendering

Chip Block Diagram

Memory • 6.4 GB/s? That’s a lot right? • PCI = 132 MB/s • AGP (@ 66Mhz) = 528 MB/s • 2x = 1056 MB/s, 4x = 2112 MB/s • System Memory ~ 800 MB/s • Radeon Graphics Card Memory = 5GB/s • 366-400 Mhz => 2.7 ns -> 2.5 ns • But since DDR can use 5ns DDR memory

More Memory Although now PC 133 with DDR ram (266 Mhz effective) can provide 2.1 GB/s (PC100 SDRAM is 1.6 GB/s)

More Memory • Radeon supports AGP 2x, 4x (1, 2 GB/s) • Supports AGP Fast Writes • The system memory is bypassed completely allowing the CPU to talk directly with the Radeon • Chip must accept data at 4x processor write speed • Improvement depends on triangle throughput (might be worse with fast writes on if not enough triangles)

Charisma Engine • Tranformation, Lighting, and Clipping • CPU generates OGL commands and provides vertex data, etc. • Transform and Light 30 M Tri/s • In reality only a fraction is delivered

Charisma: Vertex Skinning • Animations are defined by hierarchy of bones modifying a mesh • A vertex by default follows one of the bones which causes problems when joints have large angle • No way to smoothly blend • Introduce 2 world matrices one for the nearest bone and one for a neighbor

Skinning Continued • Now transforms are applied to the bones but the vertex gets a mixture of both (weighted by user) • Radeon allows up to 4 transforms per vertex (but can’t be changed within a triangle?)

After Transformation -> Setup • Triangle setup takes the x,y,z coordinates of the triangle and fills in the x,y pixels and z value • Set up time is proportional to triangle size • Idle time is proportional to the number of rendering pipelines • 4 pipelines and 2 pixels tri => 50% idle

Before going on • Pixel Fill Rate = Graphic Core clock speed * # of rendering pipelines • Texel Fill Rate = Graphic Core clock speed * # of texture units * filtering samples per clock • 1 unfiltered, 4 bilinear, 8 trilinear • Effective Fill Rate = Graphics Core clock * # of rendering pipelines * # textures in 1 cycle

Last Aside (Comparison)

Setup and Triangle Size • Becoming more of a problem • Small triangles reduce fill rate • Smaller resolutions have less effective fill rate than larger ones • Overdraw also reduces fill rate • Drawing stuff that will be drawn over again • All visible pixels accesses z twice • Plus all those reads for non-visible ones • Plus clearing the z buffer if necessary

Setup and Triangle Size

Z buffer • Big Memory bottleneck • 1600x1200x32 = 7.68 Megs • Read and written at least twice per frame • Say 7.68*2*60 = 921.6 MB/s • The most frequently accessed part of local memory

Hyper Z • Three methods • Hierarchical Z • Z-compression • Fast Z clear

Hierarchical Z • After Triangle Setup but before rendering • Look up into a coarser representation of a part of the Z buffer • The area is kept in a special cache to avoids unnecessary Z-buffer reads

Z compression • Lossless compression of Z buffer coordinates • Well really areas of the Z buffer • Well really the Z buffer cache

Fast Z Clear • 50-64 times faster than conventional Z buffer clearing • Has something to do with the cache • Without writing to the Z buffer

How bad is it? • From an experiment on Tom’s Hardware on a GeForce 2 GTS, a standard GeForce 2 GTS (200 Mhz) performs as fast as a 100 Mhz GeForce 2 GTS with ‘infinite’ memory speed • Similar for ATI but Hyper Z provides relief (about 20% more)

Pixel Tapestry Exposed in OpenGL with an extended EXT_texture_env_combine And ATIX_texture_env_dot3

Pixel Tapestry Ops • Dot product per pixel • Diffuse bump mapping • 3 Textures per pixel per clock • 3D Textures • Cube environment mapping • Environment Mapped Bump Mapping • Projective Texturing • Priority Buffers • Shadow Mapping • Range Based Fog

Impact on Fill Rate • Pixel Fill rate is about the same • 366 Mpixel vs. 800Mpixel GTS • Texel Fill rate • 1100 Mtexel vs. claimed 1600 Mtexel • Because of 3 textures per clock • Hyper Z can push this higher

3 Textures Per Clock Provides basic accumulation effects also such as soft shadows Reduces the number of texture memory accesses

Example

3D Textures • Self shadowing BRDF lookups • Volumetric fog/shadows/lighting • General 3D look up table • Also 3D texture compression

Texture Compression • Problem is that lightmaps and sky are low color and low resolution • Hard case for compression nVidia ATI

Bump Mapping • Several types • Emboss Bump Mapping • Dot Product 3 Bump Mapping (diffuse) • nVida more sophisticated • DX8 self-shadowing bump mapping • Environment Mapped Bump Mapping • Single level chained texture look up • Du/dv maps (?)

Emboss Bump Mapping Difuse Map Half Intensity Height Field Inverse HIHF

Dot Product 3 dot = Normal map is derived from a height map Light is represented at a cube map in world space Diffuse color only

EMBM • Perturb the eye space reflected ray according to some other map • Grayscale

Projective Texture Mapping • Project 3D geometry into 2D texture map • Like generating screen coordinates into texture space • Uses projective texture matrix • Can work in conjunction with a priority buffer to achieve special effects

Priority Buffer • Like Z • Provides a number (starting at 1) to each polygon depending on how close they are to the viewer • Allows for shadow mapping • Can project a light and cast shadows at the same time • Only method that supports easy self shadowing • Radeon is the first consumer hardware with this

Range Based Fog • Uses Euclidean distance rather than depth

Anisotropic Filtering • 16 tap vs. 2 tap for nVidia • Makes a noticeable difference • Especially with text

What does this all mean? • More complete/complicated lighting

N-patches • Triangular Cubic Bezier Surfaces • Supply a triangle (with normals) and a subdivision amount • N new points along each edge

N-patches continued • Project each new vertex into the plane defined by the normal

N-patch example

N-patch example 2

Video (Rage Theater) • Not in Radeon itself but usually on the same board • “on-chip motion compensation, run-level decode, de-zigzag and IDCT hardware, acceleration of MPEG-2, 8-bit per-pixel alpha blending of video and graphics, 4x4-tap filtered scaling, hardware subpicture acceleration,per-pixel de-interlacing and the ability to directly drive component video”

Future Radeon • Can have two Radeons on one board • Names • Radeon II, Radeon MAXX, Radeon Pro? • Rumors of 128 MB board (probably with 2 chips)

Radeon Graphics Architecture

Radeon Graphics Architecture

Presentation Transcript

Graphics

Graphics

Graphics

Windows Graphics Architecture

RADEON ™ 9700 Architecture and 3D Performance

A High-Performance Scalable Graphics Architecture

A View-Independent Graphics Rendering Architecture

Year 11 Graphics Architecture unit

Graphics Processing Unit (GPU) Architecture and Programming

Graphics Processing Unit (GPU) Architecture and Programming

Graphics

Graphics Processing Unit (GPU) Architecture and Programming

Graphics

Graphics

Graphics

Graphics

AMD Radeon™ graphics cards or AMD Ryzen™ processors

Collaborative Visualization Architecture in Scalable Adaptive Graphics Environment

Graphics

Graphics

AMD Ryzen 3 3200G | Radeon Vega Graphics