Optimizing Geometry for Efficient Rendering

Optimizing Geometry forEfficient Rendering Real Time Rendering Instructor: David Luebke

Immediate Mode glBegin(GL_TRIANGLES); glVertex3f(…); glVertex3f(…); glVertex3f(…); glEnd(); • Easy, flexible, good match for early hardware • Slow on current hardware • Bus-limited, API-limited • Hard to parallelize with strict ordering semantics

Immediate Mode:glVertex() 695480A0 push esi 695480A1 mov eax,fs:[00000BF0] 695480A7 test byte ptr [eax+3F5Fh],10h 695480AE je 695480E2 695480B0 lea edx,[eax+3640h] 695480B6 mov esi,dword ptr [esp+8] 695480BA mov dword ptr [edx+0Ch],3F800000h 695480C1 mov ecx,dword ptr [esi] 695480C3 mov dword ptr [edx],ecx 695480C5 mov ecx,dword ptr [esi+4] 695480C8 mov esi,dword ptr [esi+8] 695480CB mov dword ptr [edx+4],ecx 695480CE mov ecx,eax 695480D0 mov dword ptr [edx+8],esi 695480D3 mov edx,3 695480D8 call dword ptr [eax+0BFA8h] 695480DE pop esi 695480DF ret 4 695480E2 mov edx,dword ptr [eax+477F0h] 695480E8 mov esi,dword ptr [esp+8] 695480EC mov ecx,dword ptr [esi] 695480EE mov dword ptr [edx],0C2C00h 695480F4 mov dword ptr [edx+4],ecx 695480F7 add edx,10h 695480FA mov ecx,dword ptr [esi+4] 695480FD mov dword ptr [eax+477F0h],edx 69548103 mov esi,dword ptr [esi+8] 69548106 mov dword ptr [edx-8],ecx 69548109 mov dword ptr [edx-4],esi 6954810C cmp dword ptr [eax+477F4h],edx 69548112 ja 69548121 69548114 xor edx,edx 69548116 mov ecx,dword ptr [eax+477D0h] 6954811C call 6950EAE0 69548121 pop esi 69548122 ret 4 • How fast can you call glVertex3fv? • On my laptop, 15M/sec • Two conditional branches • 13(+) memory accesses! • Upwards of 800MB/sec to the memory system

Triangle Strips • Ideally, approach 1 vertex/triangle • Implication: long strips better than short • Less T&L, less memory/bus bandwidth, less API function overhead • Triangle fans: same idea • Triangle strip subtleties • Switching direction • Degenerate triangles

Display Lists • Display lists were the solution • Compilation of rendering calls • Can be re-executed w/ a single call • Can include other display lists • Potential savings: • Memory layout • API calls (less run-time error checking etc) • Even functional savings (e.g. folding matrices)

Vertex Arrays • List of vertices in an array • Separate • Interleaved • Can render from subarrays • Compiled vertex arrays • Indexed vertex arrays • Bandwidth • Vertex cache

Vertex Cache • What is the optimal way to render this mesh if you have an 8-vertex cache?

Efficient Vertex Arrays • Memory • System • AGP • Video • VAR & VBO • Locking semantics • Fence semantics

Optimizing Geometry for Efficient Rendering