Optimizing Geometry for Efficient Rendering
Dive into optimizing geometry for real-time rendering with David Luebke. Discover how immediate mode rendering using OpenGL can impact performance, its limitations on current hardware, and the benefits of triangle strips, fans, and display lists. Learn about vertex arrays, memory layouts, and caching techniques for efficient rendering. Gain insights into minimizing API overhead, enhancing memory usage, and tackling the challenges of modern graphical environments. This lecture is essential for developers seeking to improve rendering efficiency and performance.
Optimizing Geometry for Efficient Rendering
E N D
Presentation Transcript
Optimizing Geometry forEfficient Rendering Real Time Rendering Instructor: David Luebke
Immediate Mode glBegin(GL_TRIANGLES); glVertex3f(…); glVertex3f(…); glVertex3f(…); glEnd(); • Easy, flexible, good match for early hardware • Slow on current hardware • Bus-limited, API-limited • Hard to parallelize with strict ordering semantics
Immediate Mode:glVertex() 695480A0 push esi 695480A1 mov eax,fs:[00000BF0] 695480A7 test byte ptr [eax+3F5Fh],10h 695480AE je 695480E2 695480B0 lea edx,[eax+3640h] 695480B6 mov esi,dword ptr [esp+8] 695480BA mov dword ptr [edx+0Ch],3F800000h 695480C1 mov ecx,dword ptr [esi] 695480C3 mov dword ptr [edx],ecx 695480C5 mov ecx,dword ptr [esi+4] 695480C8 mov esi,dword ptr [esi+8] 695480CB mov dword ptr [edx+4],ecx 695480CE mov ecx,eax 695480D0 mov dword ptr [edx+8],esi 695480D3 mov edx,3 695480D8 call dword ptr [eax+0BFA8h] 695480DE pop esi 695480DF ret 4 695480E2 mov edx,dword ptr [eax+477F0h] 695480E8 mov esi,dword ptr [esp+8] 695480EC mov ecx,dword ptr [esi] 695480EE mov dword ptr [edx],0C2C00h 695480F4 mov dword ptr [edx+4],ecx 695480F7 add edx,10h 695480FA mov ecx,dword ptr [esi+4] 695480FD mov dword ptr [eax+477F0h],edx 69548103 mov esi,dword ptr [esi+8] 69548106 mov dword ptr [edx-8],ecx 69548109 mov dword ptr [edx-4],esi 6954810C cmp dword ptr [eax+477F4h],edx 69548112 ja 69548121 69548114 xor edx,edx 69548116 mov ecx,dword ptr [eax+477D0h] 6954811C call 6950EAE0 69548121 pop esi 69548122 ret 4 • How fast can you call glVertex3fv? • On my laptop, 15M/sec • Two conditional branches • 13(+) memory accesses! • Upwards of 800MB/sec to the memory system
Triangle Strips • Ideally, approach 1 vertex/triangle • Implication: long strips better than short • Less T&L, less memory/bus bandwidth, less API function overhead • Triangle fans: same idea • Triangle strip subtleties • Switching direction • Degenerate triangles
Display Lists • Display lists were the solution • Compilation of rendering calls • Can be re-executed w/ a single call • Can include other display lists • Potential savings: • Memory layout • API calls (less run-time error checking etc) • Even functional savings (e.g. folding matrices)
Vertex Arrays • List of vertices in an array • Separate • Interleaved • Can render from subarrays • Compiled vertex arrays • Indexed vertex arrays • Bandwidth • Vertex cache
Vertex Cache • What is the optimal way to render this mesh if you have an 8-vertex cache?
Efficient Vertex Arrays • Memory • System • AGP • Video • VAR & VBO • Locking semantics • Fence semantics