LOD Case Study & Application

LOD Case Study &Application Robert Huebner Nihilistic Software innerloop@nihilistic.com

Speaker Bio • President and Director of Technology for Nihilistic Software • Currently working on “Starcraft:Ghost” for Blizzard Entertainment • Previous credits include Vampire: The Masquerade, Jedi Knight: Dark Forces 2, Descent • International Game Developer’s Association Board Member (IGDA) • www.igda.org • Game Developer’s Conference (GDC) Advisory Board

Purpose of Talk • Review some of the topics and ideas presented earlier in the course • Try to explain what worked for us, and what didn’t • This talk is a “case study in progress” for our current Gamecube and XBOX work • Still tweaking and changing some LOD schemes

Starcraft: Ghost(needs LOD too!)

Goal of LOD • Back on Pre-3D-hardware PCs, we would spend a LOT of CPU to avoid drawing a few triangles • The cost of rendering was much higher • We were willing to spend significant CPU to eliminate a single triangle • Systems like ROAM, view-dependent LOD • Current hardware renders fast, so we only spend CPU if we can discard a lot of triangles • Or if it saves us state changes, texture fetches, memory bandwidth, or other costly processing

General Block Diagram RAM CPU FIFO GPU Frame buffer Vertex Unit Pixel Unit Texture Mem

Data Flow Management • Managing data flow and bandwidth is an important performance metric • Each platform has different architectures • So our choice of LOD differs for each platform • Each main data path can utilize different LOD techniques to increase throughput • We try to do this without wasting CPU or memory resources, which are also scarce

Where Do We Use LOD? RAM CPU FIFO GPU Framebuffer Vertex Unit Pixel Unit Texture Mem

Classes of Game LOD • The design of most console systems is dominated by three data paths: • The RAM->GPU path and GPU throughput is managed with geometric LOD • The GPU->Framebuffer path is managed via shader LOD • The Texture->GPU path is managed with MIP-mapping and shader LOD

Games Vs. Research • The biggest problems we run into when adopting academic LOD systems to game use are: • Dealing with additional properties of meshes • Vertex normals, texture, UV coordinates, etc. • Avoid the need for general-purpose processing at the vertex level • Maintaining data in a format that our hardware can process directly

Runtime Selection • In our engine, all LOD processing for a given object is driven by a single value • The LOD value is stored both as a float (0.0 to 1.0) and as a discrete BYTE (1..X) • Each sub-system that wants to do LOD can use either version of the LOD metric to control behavior

Runtime Selection • The LOD metric is stored for each object or “sector” (world section) • Based on many factors (highest to lowest weight) • Estimated screen space (size / distance) • Overall performance or estimated triangle counts for scene (scene metric) • Current player control mode (interact or cutscene, combat or stealth) • “Importance” of the object (active AI vs. inactive AI) • Viewing angle for terrain blocks

Geometric LOD • Geometric LOD is the most interesting & complex topic for games • There are three main goals we try to achieve with geometric LOD: • Send less data to the GPU to avoid exceeding its throughput • Utilize less bus bandwidth moving data into the graphics unit • Try achieve a constant average triangle size to balance load between vertex and pixel units

Compiled Models • Most game engines are constructed to load “compiled” models • Vertex data is adjusted to match native format • Triangles are batched to minimize state changes and fit within hardware limits • Optimum strips are constructed • DisplayLists/Pushbuffers are compiled • Compiled models are highly platform-specific

Basic LOD Choices • Based on platform specifics, we select a simple half-edge collapse operation as the basis of our LOD • Minimizes memory use, vertex data remains unchanged • Minimizes dynamically changing vertex data, which minimizes bandwidth & FIFO space • Allows us to address problems with property discontinuities

Calculating LOD • We perform all our LOD computation off-line during model compilation • We offer the artists a choice of LOD metric to use when computing automatic LOD levels • We chose an LOD scheme that is based on half-edge collapse operations only • Less memory, more static data set • The LOD is constructed based on edge score • Each edge in the model is given a score based on its length, curvature, or other factors • Vertices are also given scores to control which endpoint is preserved during the edge collapse

Calculating LOD • We begin by building an augmented “collapse vertex” structure for the model • Links to neighbor verts (edges) • Links to associated faces • Link and score of “least cost” edge • Identification of “border” or “seam” verts • Links to “paired” verts • Links to the actual “render” vertices • This process happens after vertices are split due to texture/normal/UV changes • This means one collapse vertex can be linked to multiple “export” vertices

Calculating LOD • We add game-specific restrictions to LOD • Either adjust the vertex score, exempt it entirely, or link its removal to that of another vertex • Texture or UV mapping “seams” due to composited textures • Vertex normal discontinuities (hard edge) • Unpaired edges • Artist influence (blind vertex data in Maya) • We also use domain-specific knowledge to adjust scoring algorithm • Terrain blocks use z (height) differential as main score factor • Shadow/collision LOD ignores texture/UV seams

Calculating LOD • Once we have a full set of edge scores, we select the least cost edge and remove its least cost vertex • Half-edge collapse to the higher-cost endpoint • Record the operation in fields in our underlying data • Remove degenerate triangles • Re-compute all edge costs in neighboring triangles • Repeat until only non-collapsible edges remain

Note on quality • Our reduction and scoring system is simple, but accuracy suffers • Because of this, we have found that the last 10% or so of the collapse operations are judged by artists as being unsatisfactory • We allow the export process to specify some control over the quality • Limit on the maximum cost collapse that will be executed (default excludes about 10% of operations) • Object-specific tweaks to the computed LOD factor

Calculating LOD • The results of this operation are two new data fields in our renderable vertex structure • The “collapseOrder” field gives the ordering of the collapse operation • The “collapseTo” field is the destination vertex for the edge collapse operation that removes this vertex from the mesh • Using these fields, we can export the LOD in various ways in the final compilation • Since the LOD metrices are all export-side, we can adopt improvements periodically without affecting run-time data • Just re-export to get benefits of better reduction

Discrete LOD • Discrete LOD is still the workhorse of game mesh LOD • Each level can undergo heavy pre-processing for strip-ordering or displaylist creation • Artists can hand-tune the reduction for visual accuracy • Can optionally replace both vertices and index lists, or just indices to save memory • We represent discrete LOD by loading multiple sets of face index lists, or separate “index buffers” • Vertex data is unchanged

Exporting Discrete LOD • We can use our computed data to export any number of discrete LOD steps • Pick a desired number of vertices for the LOD level • Calculate how many collapse operations will reach this level • Build an indexed ordering for the mesh • For any vertex with a “collapseOrder” value lower than the # of operations, replace its index with its “collapseTo” index • Repeat until a vertex is reached that has a higher collapseOrder field • Process each index ordering for strips & cache coherency, create packets, etc.

Discrete Blended LOD • To minimize “popping” that occurs during the LOD switch, we can use image-space blending • When an object needs to change between discrete LOD levels, it is queued for blending • During blending, the object is actually rendered twice, at both LOD levels, and the alpha values are cross-faded • In practice, we find this is useful for larger objects or terrain blocks, but not useful for typical models

Continuous LOD • Continuous LOD can be an effective extension to discrete-LOD for games • Reductions with greater granularity can avoid visible “popping” • It can also save memory compared to storing a high number of discrete levels • Our continuous implementation is based mainly on half-edge collapse • This is the best way to keep our data static

CLOD Implementation • To implement run-time CLOD, what we’re effectively doing is moving our off-line creation of discrete LOD index lists to the run-time engine • To save memory, we re-order vertices in order of their “collapseOrder” field • We export a separate parallel array to contain the “collapseTo” index for each vertex

CLOD Runtime • At run-time, we select a desired number of vertices and repeat the recursive collapse process • Each index replaced with its collapseTo until a value less than the desired size is reached • For efficiency, we re-order our original index list in reverse-collapse order • This allows us to stop when the first degenerate triangle is detected during the collapse process • The result is a new indexing of the mesh with the precise number of vertices requested • Result is cached in our model instance data

CLOD Advantages • This method maps moderately well to console needs • The vertex data remains static and indexable • Re-indexing can be cached over multiple frames to amortize costs • Minimal storage costs above cost of storing basic model data • 2 bytes per vert fixed-cost • Can actually be more memory-efficient than discrete LOD, but not by a lot

CLOD Disadvantages • The biggest challenge with CLOD is to optimize the index ordering • Normally we perform intense, off-line strip generation to achieve this • With an index list that could change every frame, we aren’t able to spend time generating strips • We can still “compile” displaylists, etc. but at some additional cost • Skip strips and similar techniques of partial-strip buffering can help address these concerns • Exploit the fact that most of the model remains unchanged after each step

Non-Geometric LOD

Vertex Shader LOD • Vertex “shader” refers to the processing path required to setup each vertex in the scene • Newer PC and console hardware allow for extremely complex vertex operations including transformation, blending, and lighting • The throughput of the GPU in verts/sec varies by orders of magnitude depending on the processing required • Un-textured, un-lit = 30M V/s • Dual-texture, 4 Lights = 9M V/s

Lighting LOD • One of the most costly parts of vertex processing is lighting calculation • Generally the cost increases linearly with the number of active lights. • All games do basic operations like selecting the X brightest nearby lights for each mesh • The number of lights X can be increased/decreased based on LOD metrics

Pre-lighting • Because lighting is so expensive, a common optimization is to pre-calculate lights when possible • A non-moving (or rarely-moving object) can have the lighting contribution from all nearby, non-moving lights calculated offline & stored in per-vertex color channel • As long as certain conditions hold, the object is rendered with a 0-light path • If additional moving lights come into range, the hardware allows us to add dynamic and pre-calculated colors in hardware • If the object moves, it can revert to real-time lighting

Lighting LOD • At lower LOD levels, we can use simpler lighting equations • Use a static envmap (spherical or cubic) and normal-based texture projection to approximate diffuse lighting • Switch to purely ambient lighting or directional lighting at low LOD • At lower LOD levels, shadow generation is reduced or disabled • Remove self-shadowing, remove accurate projected shadow volumes or textures

Projected Lighting • A common technique in current games is to use texture projection to simulate complex lighting scenarios • Generally this requires an additional rendering pass on affected meshes • At lower LOD, we attempt to replace a projected light with a similar point or spotlight • Match color & size to approximate the texture effect • We also begin to exclude smaller objects from projection • Light will affect walls, but not characters

Vertex Shader LOD • After lighting, the next most costly operation is skinning or blending the vertex • Can be performed by fixed-function matrix-palette blending, or programmable vertex shader • Our goal with LOD is to use the existing model data but to simplify the vertex processing math • We create N versions of all active game vertex processing functions • All accept the same input data • Selection is driven at run-time by the shared “LOD Factor” • Essentially its discrete vertex LOD

Model Coordinate System • We store vertex position and normal data in “model space” • This enables us to select between several types of vertex processing when needed • If we ignore all bone associations and render with a single transform, we get the “at-rest” model pose • If we store bone influences in sorted order, we can blend only against the first bone to get less-accurate skinning

Skeleton LOD • The number of bones in a model skeleton can also affect performance • Our vertex shader offers a fixed number of matrices that can be loaded into hardware registers simultaneously • This limits on the number of faces we can render before re-loading these registers (batch size) • We can replace a vertex->bone binding with that bone’s parent to eliminate “leaf” bones • Their geometry will behave as if the removed bones are fused in their at-rest pose • This needs to be done off-line because it affects how we split the model into render groups

Other Vertex LOD • At lower LOD, we replace accurate reflected-normal vectors with camera-space normal vectors • Requires less CPU assistance on some platforms • We can often reduce the accuracy of skinning/blending for normal vectors before we do the same for position vectors • Effects of inaccurate normals are far less obvious

Pixel Shader LOD • Pixel shader LOD simply means having multiple implementations of each raster-level visual effect • Alternate versions would achieve a similar visual result with fewer render passes, texture stages, or texture fetches • Disabling multi-pass techniques is particularly effective because it benefits geometric LOD as well • Reducing texture stages or fetches increases pixel fill-rate • Generally implemented simply as multiple code paths selectable according to LOD metrics • Light mapped walls can revert to vertex-lit • Bumpmaps, Envmaps are blended out

Imposters • The most extreme form of geometric LOD is replacing a complex object with an imposter • The imposter can be a flat, textured quad • Or it can be a simple geometric shell • The goal is to approximate the shape & color of the original object at great distances • Some game objects are always rendered as imposters • Particles, explosions, bullets, foliage

Billboard Imposter • The billboard imposter replaces a complex shape with a flat textured quad • Can be rotated to face the camera in 1, 2 or 3 axes, depending on object symmetry • The texture can contain multiple frames to represent different angles or animation frames • The engine can blend between frames to improve fidelity, or use 3D volume textures to perform hardware blending • Typically billboard imposters use masked (1-bit alpha) texture images so the actual quad outline is not visible • “Z sprites” can provide imposters that z-buffer more accurately, particularly useful in clusters of objects

Dynamic Texture Imposter • Render-to-texture is a common & reasonably efficient console pipeline • Non-dynamic texture imposters use valuable texture memory • Gives better simulation of animation, lighting, and movement of the replaced objects • We allocate a pool of textures for dynamic imposters at startup and re-use them when necessary • A large crowd scene might re-use each imposter many times

Geometric Imposter • A Geometric imposter uses a rigid 3D model in place of a complex articulated 3D model • The “rigid mesh” vertex shader is usually several times faster than skinned/blended • The imposter can use simpler shaders, fewer textures, and larger render batches • Geometric imposters look better when viewed from multiple angles (object rotating or camera panning) • Can take up less memory than multi-frame texture imposters, and can render nearly as quickly

Terrain LOD • Terrain LOD is often handled specially • Mainly because the terrain is very large compared to the viewer (player) • Our terrain is not stored as a heightfield, so we can do more arbitrary shapes • We break the terrain into separate blocks according to a 2D grid overlay

Terrain LOD • Each block has discrete LOD levels pre-computed and compiled into display lists • At run-time, an LOD factor is computed for each block • Based on distance, viewing angle, viewer height • Vertices that lie along the boundaries between blocks are not subject to removal • This avoids opening gaps and allows each block to LOD independently • Image-space blending can help hide switches

Image Processing Techniques • Z-Fade • Gameplay elements that are only of player interest at close range can be alpha blended out at increasing z-distance • Powerups, small detail models, ground cover foliage, atmosphere objects, etc. • Depth of Field effects • If the game utilizes a depth-of-field effect to blur distant objects, the game can use far more aggressive distance LOD schemes

Non-Visual LOD • Creating a special LOD geometry for shadow projection • Could use more aggressive methods beyone half-edge collapse to generate silhouettes • Because shadows don’t have texture/lighting concerns, we can be more aggressive in choosing algorithms • Automatic Collision geometry • Currently we create collision geometry using simple volume shapes, or convex hull algorithms • More demanding games could use some of the volume-based LOD reductions to create better-fit collision geometry

Future Directions • Subdivision & curved surfaces • If future platforms increase RAM sizes and are fast enough to render 1-tri-per-pixel, its unclear if subdiv is needed • However, artists are adopting this rapidly for cutscene work, so data-sharing is appealing benefit • Subdivision with hardware support that was effectively “free” would definitely find an audience • Otherwise, we expect that next-generation projects will continue to encode more data into textures and use programmable shaders to simulate details

Future Directions • Vertex processing hardware is becoming more general-purpose • Will allow more meaningful per-vertex processing for LOD schemes • Possibly more emphasis on view-dependent schemes

LOD Case Study & Application