270 likes | 423 Vues
This document outlines the objectives and current status of research focused on future GPU technologies for 3D graphics rendering. Key goals include simulating both current and upcoming GPU hardware while completing a PhD. It addresses challenges in selecting simulation targets among existing and next-gen GPUs. The document further delves into designing a hardware 3D graphics pipeline, detailing components such as command processors, vertex and pixel shaders, as well as geometry and rasterization processes.
E N D
Status – Week 281 Victor Moya
Objectives • Research in future GPUs for 3D graphics. • Simulate current and future 3D graphic hardware. • Finish (someday) the PhD ;).
Problems • Information. • Choice of the simulation target: • Current GPUs. • Near future GPUs. • Absolutely new GPU designs. • Future is hard to predict. • But GPUs change very fast. • Fierce competence between ATI and NVidia. Matrox and 3DLabs follow (3DLabs can rule workstation market). SIS and VIA as OEM.
Status • Designing a hardware 3D graphics pipeline: • Command processors. • Vertex Shader. • Divide by w, Clip, Culling and Triangle Setup. • Rasterization. • Pixel shaders. • Antialiasing. • Designing the simulator.
Geometry • Vertex operations: • (1) Transform coordinates and normal • Model => World. • World => Eye. • (2) Normalize the length of the normal. • (3) Compute vertex lightning. • (4) Transform texture coordinates. • (5) Transform coordinates to clip coordinates (projection). • (8) Divide coordinate by w. • (9) Apply affine viewport transform (x, y, z).
Geometry • Primitive operations: • (6) Primitive assembly • (7) Clipping: • (10) Backface cull: eliminate back-facing triangles. • Primitive generation: new pipeline stage (ATI TruForm).
Vertex Shader • VS 1.0, 1.1 and 1.2 (current technology) for Direct3D 8 and 8.1. OpenGL extensions: ARB_vertex_program (finally in OpenGL v1.4), NV_vertex_program1_1 (NVidia), EXT_vertex_shader (ATI). • No branching. • Single cycle execution latency (?). • Single issue instruction each cycle. • Simple in order pipeline (?).
Vertex Shader • 16 input registers (read only). • 15 output registers (write only). • 12 temporary registers (read/write). • 96 constant registers (read only or read/write?). • 256 instructions max
Vertex Shader • Output • Inputs (vector or • Opcode (scalar or vector) replicated scalar) Operation • ------ ------------------ ------------------ -------------------------- • ARL s address register address register load • MOV v v move • MUL v,v v multiply • ADD v,v v add • MAD v,v,v v multiply and add • RCP s ssss reciprocal • RSQ s ssss reciprocal square root • DP3 v,v ssss 3-component dot product • DP4 v,v ssss 4-component dot product • DST v,v v distance vector • MIN v,v v minimum • MAX v,v v maximum • SLT v,v v set on less than • SGE v,v v set on greater equal than • EXP s v exponential base 2 • LOG s v logarithm base 2 • LIT v v light coefficients • DPH v,v ssss homogeneous dot product • RCC s ssss reciprocal clamped • SUB v,v v subtract • ABS v v absolute value
Clipping • Clip geometry primitives with the view frustrum (6 planes). • Clip geometry primitives with the user clip planes. • Techniques used: • Guard-Band Clipping. • Homogenous rasterization avoids clipping in the geometry stage.
Homogeneus coordinates • “Triangle Scan Conversion using 2D Homogeneus Coordinates”, Olano and Greer.
Rasterization • Setup (per-triangle). • Sampling (triangle = {fragments}. • Interpolation (interpolate colors and coordinates).
Rasterization • Converts primitives to fragments. • Primitive: point, line, polygon, … • Fragment: transient data structure short x, y; long depth; short r, g, b, a; • Fragment selection. • Parameter Assignment (color, depth ...).
NV_vertex_program2 • ARL (new support for four-component A0 and A1 instead of just A0.x) • ARR (similar to ARL, but rounds instead of truncating before storing the integer result in an address register) • BRA, CAL, RET (branching instructions) • COS, SIN (high-precision trigonometric functions) • FLR, FRC (floor and fraction of floating-point values) • EX2, LG2 (high-precision exponentiation and logarithm functions) • ARA (adds pairs of components of an address register; useful for looping and other operations) • SEQ, SFL, SGT, SLE, SNE, STR (“set on” instructions similar to SLT, SGE) • SSG (“set sign” operation; generates a vector holding –1.0 for negative operand components, 0 for zero-value components, and +1.0 for positive components)
NV_vertex_program2 Overview • 1. Condition codes • 2. Branching & subroutines • 3. Even faster performance • 4. Nineteen new instructions • 5. New source modifiers • 6. Clip plane support • 7. More registers & instructions
NV_vertex_program2 Resource Limits • 256 vertex program parameters • Up from 96 • 16 temporary registers • Up from 12 • Two 4-component address registers • Up from one single-component address register • 256 static instructions per program • Up from 128 • Given branching, 65536 dynamic instructions can execute before termination to avoid infinite loops
NV_vertex_program2 Source Modifiers • Source operand absolute value • Example: MOV R0, |R1|; • In addition to source negation & swizzling • Example: MAD R0, -|R1|.yzwy, |R2|, -R3,w; • Swizzle, negate, & absolute value operations are “free” source modifiers
NV_vertex_program2 Condition Codes (1) • Condition code state • 4-component register stores condition code values • Four possible values • LT –less than zero • EQ – equal to zero • GT –greater than zero • UN– unordered, for comparisons involving NaN • Most instructions optionally update condition code state • Indicated with “C” suffix: DP4C, MOVC, etc • “CC” pseudo-register used to just update condition codes
NV_vertex_program2 Condition Codes (2) • Optional condition code based destination masking • Example: MOV R1.xy(NE.z), R0; • Copy R0components to R1’s X & Y components except when condition code’s Z component is EQ • Condition code rules: EQ, equal; GE, greater or equal; GT, greater than; LE, less or equal; LT, less than; NE, not equal; FL, false; and TR, true • Note that condition code masking rule can swizzle condition code components