GPU Architecture Innovations: Week 260 Summary

Status – Week 260 Victor Moya

Summary • shSim. • GPU design. • Future Work. • Rumors and News. • Imagine.

shSim • Currently working: • Command Processor: reads a text based trace file (programs, parameters, vertexs, commands to rasterizer). • Shader: simulates a N multithreaded, variable latency support, VS1 capable ‘vertex’ shader. • Rasterizer: OpenGL ‘emulator’, accepts resolution and clip planes changes, recieves ‘shaded’ vertexs from the shader (only 2 QuadFloats, vertex positon + color), displays the triangles in a GL window.

shSim • Tests: • 2/4 multithread (with another 2/4 input buffers) single shader. • Fixed 3 latency cycles. Shader to Rasterizer latency of 4. CommandProcessor to Rasterizer latency of 6. • Simple coordinate change traces (shader.input, shader.input.2). • Ripple vertex shader example from DX8 & DX9 SDK (ripple.input): • Around 300 triangles (1100 vertexs). • Color is calculated from vertex position.

shSim • Ripple.vsh.

shSim • Screenshots from frames rendered by shSim:

GPU Architecture • Based in current GPUs: • NV30 • R300 • Based in other graphic processors: • PS3 • Imagine

GPU Architecture • Based in an API: • DX8 • DX9 • DX10 • OpenGL 1.4 and extensions. • OpenGL 2.0 • Based in an architecture model: • Vector • Scalar • Multithreaded

GPU Specification • Shader Model: • Language: • DX9: • VS2.0/PS2.0. • VS3.0/PS3.0. • OpenGL: • NV_vertex_program_2/NV_fragment_program. • ARB_vertex_program/ARB_fragment_program. • Our own language.

GPU Specification • Shader Architecture: • Architectural model: • Scalar. • SIMD. • Multithreaded. • Vector. • Out-of-order.

GPU Specification • Configuration: • Integer Unit: • Number. • Precission. • SIMD or scalar? • Float Point Unit: • Number. • Precission. • SIMD or scalar?

GPU Specification • Memory Unit: • Number. • Texture modes. • Filtering modes. • Register Banks: • Number. • Ports. • Size. • Scalar or SIMD?

XBOX (NV2A) Vertex Shader

Future Work • Shader: • Add branch/call/ret instructions. • Add texture instructions (Pixel Shader). • Command Processor: • Define a trace specification: binary, gzipped? • Define an interface with OpenGL (Mesa?) or DX8/DX9 (driver?). • Primitive Assembly: • Implement vertex cache and primitive assembly (only triangles?). • Implement culling and clipping?

Future Work • Deferred rendering? • Transformed geometry must be stored in video memory. • Geometry must be sorted: • Tiles. • Front to back. • Rasterization: • Triangle Setup and Fragment Generation. • Any suited method: Olano & Greer, DDA?. • MSAA support?

Future Work • Early Z and Hierarchical Z? Pixel Shader: • Implement unified with vertex shaders? • Queue/buffering mechanism? (memory/texture latency very large). • Pixel Shader: • Unified shader architecture? • Pixels need a lot of buffering (memory/texture operations). • Implement a TMU simulator (filter algorithms, memory access, texture compression, cache).

Future Work • Fixed fragment operations: • Implement using the shader? • Fog: remove? • Pixel Ownership: remove? • Scissor Test: implement (needed if clipping is not implemented). • Alpha test: same as Z Test. • Z Test and Stencil Test: must be implemented, but could be added to a generic shader unit? • Blending: add to shader? • Dithering: remove. • Logical Op: remove or add to shader. • MSAA Operations: ?

Future Work • Framebuffer: • Z compression. • Color compression. • SSAA or MSAA support?

News and Rumors • NV30 architecture: • 4x2 pixel pipes? • 8x zixel pipes (Z Test & Stencil only). • ATI ready to release R350 and RV350 in a couple of weeks. • R350: Updated R300 core with additional features (?) and increased clock frequency (375 – 400 MHz). • RV350: value chip based in R300 core. Maybe 8x1 core, 128 bits bus. Clock frequency 300 – 400 MHz. 75 Million transistors.

Imagine • ‘Computer Graphics on a Stream Architecture’, John Douglas Owens, PhD dissertation. • Not read yet either.

GPU Architecture Innovations: Week 260 Summary

GPU Architecture Innovations: Week 260 Summary

Presentation Transcript

Computer Organization EECC 550

CenturyLink | Qwest SAP Integration Program Status Report – Week Ending 5/20/11

GBT Project Status

EMCAL Offline Status

xkcd/

HCS 350 UOP TUTORIAL / Uoptutorial

HUM 150 UOP Course Tutorial / Tutorialoutlet

US 101 UOP COURSE Tutorial/UOPHELP

RDG 415 UOP Tutorial / Uoptutorial

ACC 305 ASH Course Tutorial / Tutorialoutlet

Best Whatsapp Status in Hindi

PROJ 586 Full Course Project

FP 101 UOP Academic Achievement / uophelp.com

BUS 308 OUTLET Peer Educator/ bus308outlet.com

OMM 618(ASH) Inspiring Minds/uophelp.com

MGT 521Dreams Come True /uophelp.com

MGT 521 AID Critical Thinking Technology/mgt521aid.com