290 likes | 407 Vues
An overview of shSim GPU design, future work plans, shader simulations, and rumors in the tech world. Includes details of shader processors, rasterizers, and potential advancements in pixel shaders.
E N D
Status – Week 260 Victor Moya
Summary • shSim. • GPU design. • Future Work. • Rumors and News. • Imagine.
shSim • Currently working: • Command Processor: reads a text based trace file (programs, parameters, vertexs, commands to rasterizer). • Shader: simulates a N multithreaded, variable latency support, VS1 capable ‘vertex’ shader. • Rasterizer: OpenGL ‘emulator’, accepts resolution and clip planes changes, recieves ‘shaded’ vertexs from the shader (only 2 QuadFloats, vertex positon + color), displays the triangles in a GL window.
shSim • Tests: • 2/4 multithread (with another 2/4 input buffers) single shader. • Fixed 3 latency cycles. Shader to Rasterizer latency of 4. CommandProcessor to Rasterizer latency of 6. • Simple coordinate change traces (shader.input, shader.input.2). • Ripple vertex shader example from DX8 & DX9 SDK (ripple.input): • Around 300 triangles (1100 vertexs). • Color is calculated from vertex position.
shSim • Ripple.vsh.
shSim • Screenshots from frames rendered by shSim:
GPU Architecture • Based in current GPUs: • NV30 • R300 • Based in other graphic processors: • PS3 • Imagine
GPU Architecture • Based in an API: • DX8 • DX9 • DX10 • OpenGL 1.4 and extensions. • OpenGL 2.0 • Based in an architecture model: • Vector • Scalar • Multithreaded
GPU Specification • Shader Model: • Language: • DX9: • VS2.0/PS2.0. • VS3.0/PS3.0. • OpenGL: • NV_vertex_program_2/NV_fragment_program. • ARB_vertex_program/ARB_fragment_program. • Our own language.
GPU Specification • Shader Architecture: • Architectural model: • Scalar. • SIMD. • Multithreaded. • Vector. • Out-of-order.
GPU Specification • Configuration: • Integer Unit: • Number. • Precission. • SIMD or scalar? • Float Point Unit: • Number. • Precission. • SIMD or scalar?
GPU Specification • Memory Unit: • Number. • Texture modes. • Filtering modes. • Register Banks: • Number. • Ports. • Size. • Scalar or SIMD?
Future Work • Shader: • Add branch/call/ret instructions. • Add texture instructions (Pixel Shader). • Command Processor: • Define a trace specification: binary, gzipped? • Define an interface with OpenGL (Mesa?) or DX8/DX9 (driver?). • Primitive Assembly: • Implement vertex cache and primitive assembly (only triangles?). • Implement culling and clipping?
Future Work • Deferred rendering? • Transformed geometry must be stored in video memory. • Geometry must be sorted: • Tiles. • Front to back. • Rasterization: • Triangle Setup and Fragment Generation. • Any suited method: Olano & Greer, DDA?. • MSAA support?
Future Work • Early Z and Hierarchical Z? Pixel Shader: • Implement unified with vertex shaders? • Queue/buffering mechanism? (memory/texture latency very large). • Pixel Shader: • Unified shader architecture? • Pixels need a lot of buffering (memory/texture operations). • Implement a TMU simulator (filter algorithms, memory access, texture compression, cache).
Future Work • Fixed fragment operations: • Implement using the shader? • Fog: remove? • Pixel Ownership: remove? • Scissor Test: implement (needed if clipping is not implemented). • Alpha test: same as Z Test. • Z Test and Stencil Test: must be implemented, but could be added to a generic shader unit? • Blending: add to shader? • Dithering: remove. • Logical Op: remove or add to shader. • MSAA Operations: ?
Future Work • Framebuffer: • Z compression. • Color compression. • SSAA or MSAA support?
News and Rumors • NV30 architecture: • 4x2 pixel pipes? • 8x zixel pipes (Z Test & Stencil only). • ATI ready to release R350 and RV350 in a couple of weeks. • R350: Updated R300 core with additional features (?) and increased clock frequency (375 – 400 MHz). • RV350: value chip based in R300 core. Maybe 8x1 core, 128 bits bus. Clock frequency 300 – 400 MHz. 75 Million transistors.
Imagine • ‘Computer Graphics on a Stream Architecture’, John Douglas Owens, PhD dissertation. • Not read yet either.