Making the Pieces Fit Together

Making the PiecesFit Together Jonathan Blow Game Developers Conference Reception October 21, 2002 Seoul, Korea

3D Techniques I Will Discuss • Level-of-Detail Management (LOD) • Triangle Strip Generation • Vertex Cache Optimization • Normal Map Generation • Ordered Rendering (sorted output geometry)

How I will discuss them • You can read about these techniques on the internet: hardware vendor sites, programmer hobbyist sites. There is a lot of hype. • Most of this stuff is not written by people actually making ambitious games (they’re busy!). • Most of it is ill-advised. • I want to provide a hype-free, skeptical review.

Lecture in Three Parts • Part 1: A Sense of Perspective • What is 3D rendering for games, today? • Part 2: The Techniques • Explained by a Skeptic… • Part 3: Making Games • How to use 3D techniques without going out of business or building a horrible game.

Part 1:A Sense of Perspective

3D rendering for games is a complicated subject • Partially because we have accomplished a lot • Recent demos, and a few games, are graphically very impressive • (but the demos look much better than the games – why is that??) • The games all draw worlds by projecting a bunch of triangles onto the screen.

Primary Rendering Paradigm • Projecting triangles – but very fancy triangles • Texture maps, normal maps, complex lighting • Alternative representations exist • NURBS, N-Patches, subdivision surfaces • These are used in preprocesses, translated into triangles for the realtime pipeline.

Why have triangles dominated? They are simple and robust.

Suppose we’re inventingrealtime rendering from scratch • Project every point of solid object to the screen, use depth buffer • We waste a lot of resources drawing everything inside the solid, which will inevitably be hidden! • Cull out interior points (same result) • Now we have a bunch of solid 2D shells to draw, but each still has a large number of points • We want a more compressed way to represent 2D subsets of 3D

Introducing the Triangle • The triangle is the simplest way to denote a closed region of 2D space.

Start with a point • We have a 0-dimensional space P

Define one more point • Suddenly we have a 1D space! P + t(Q-P) • That is a lot bigger than 0D. Q P

Add a third point • Now we have a 2D space! P + t(Q-P) + s(R-P) • In a way, the concept of a triangle is the same as the concept of two dimensions. R Q P

The linearity of the triangleis tremendously useful! • Easy to: • Interpolate • Clip • Intersection test • Bounding volume • Linear equations are the most basic and well-understood kind (see, for example, linearizing differential equations!) • If you are doing something unconventional, the triangle probably won’t get in your way.

Higher-order surfaces causemore problems. • Clipping a curved surface is annoying. • Bounding volumes are also annoying. • The offset of a Bezier surface is not a Bezier surface • So what happens if spline parameters are your base representation, and you need to offset? Green surface is a spline Red is not

Among linear polygons,triangles are the simplest. • Quads can be noncoplanar (vertex lighting will fail!) • Pipeline must handle primitives of varying vertices • Games had brief dalliances with quads / n-gons around 1996, but nobody uses them any more to represent general geometry.

In Summary • The impressiveness of our current graphics techniques depends on us being able to draw a lot of triangles.

Question: “So how do I draw a lot of triangles?” (Answer: “very carefully.”) Part 2:The Techniques

Rendering Techniquesthat people like to hear about…but first: • There are two basic kinds of 3D techniques • #1: We would think about them if we had infinitely fast hardware (e.g. projective transform, BRDF) • #2: The kind we only care about because hardware is slow • Type #2 usually introduces complications, and we need to manage those complications

Drawing a lot of triangles:Reduce Data Size • Fancy Triangles = big vertices (60 bytes each) • XYZ position (12 bytes) • Texture UV coordinates (8 bytes) • RGBA color (4 bytes) • Tangent frame (36 bytes; maybe smaller) • 180 bytes per triangle if you just list vertices! (5000 triangles = 900 kbytes) • This makes the hardware run slowly

Indexed Triangle List • A mesh has a lot of shared vertices • Put the vertices into an array • The triangles are described by indices into this array • Shrinks total amount of data • F = 2V ; S0 = 3kF; S1 = kV + 3iF; S1 – S0 = V(5k – 6i) Bonus: Separates topology from position data 3 2 4 0 1 0, 1, 2 1, 2, 3 1, 3, 4

Triangles in a mesh sharenot only vertices, but edges too 3 3 3 2 2 2 4 4 0 1 1 0 1 1 0, 1, 2 1, 2, 3 3, 1, 4

Triangle Strips • We can compress a list of indexed triangles by forming “strips” that run along the shared edges. 6 4 012, 123, 234, 345, 456 2 5 012, 3, 4, 5, 6 3 0 1

Cost analysis of triangle stripsis often somewhat wrong • 3 indices for the 1st triangle, 1 for each thereafter • Incomplete because there also needs to be a way to delimit strips 3 strips: 01234, 567, 89241 Index buffer: 0123456789241 But where do they start and end?

Delimiting Triangle Strips 3 strips: 01234, 567, 89241 • Explicitly add numbers to describe strip length • DirectX8-style separate API calls (impact on CPU usage, AND adds numbers behind the scenes) • Strips start out worse than lists, and have to catch up… the longer the strip, the better you catch up Index buffer: 5012343567589241 Index buffer: 0123456789241 DrawIndexedPrimitive 0, 5DrawIndexedPrimitive 5, 3DrawIndexedPrimitive 8, 5 Output stream: 5012343567589241

Because triangle strips are limited, we need to add swaps 6 6 5 4 4 2 2 5 3 3 0 0 1 1 012, 3, 4, 5, 6 012, 3, 2, 4, 5, 6 012, 123, 234, 345, 456 012, 123, 232, 324, 245, 456

Triangle Strip Efficiency • Depends on strip length, which depends on your data • It takes a complicated algorithm to make good strips. 4 strips, 40 indices 10 strips, 52 indices (no swaps yet)

Triangle Strip Skepticism • In a full game, performance numbers don’t necessarily validate triangle strips… we’ll see why • Strips make implementation complications • Even with perfect stripping, you only reduce index data (minority of total data) from 6iV to 2iV+2. You won’t have perfect stripping. • Degenerate triangles can cost you.

If you want to make a strip algorithm… • Most papers give you the basic idea, but are not very good in the end • Old SGI source code • STRIPE papers • You really want a non-greedy algorithm • Heuristics based on strip length and cache • Tunneling operator

Vertex Cacheand Vertex Shader • We want to cache vertex memory for fast access… • Vertex Shader is a small hardware program that runs for each vertex • Compute lighting, transform, skinning, etc • Hardware caches the results of the evaluated vertex shader • A cache miss means running the shader again • (More expensive than traditional CPU cache miss!) memory shader vertex cache

You want to order verticesby cache efficiency • Mostly use vertices you just used recently • But this conflicts with triangle strip efficiency! Can’t even do the red path in one triangle strip without inserting a teleport (very expensive!)

Vertex cache effectscan be dominant • Multi-pass rendering –you skin the guy multiple times, so shader is expensive! • Or do you skin on the CPU? • Now you begin to have a lot of optimization choices; these can determine who’s dominant • The “right answer” depends on your game and target platform

How do we resolve the conflict between strips and cache? • Maybe you write a triangle stripper that tries to deal with the vertex cache • Complicated to write, degraded results on both sides; Nvidia’s does this • Maybe you ignore vertex caching • Might be okay if your shaders are cheap • Maybe you ignore triangle strips, and just use triangle lists

Quirks of some architectures make strips better • Nvidia triangle setup (Xbox, etc) • Nvidia push buffer bottleneck also makes strips more effective.

Now… we need some kind of LOD • Because even perfect triangle strips / cache hits still draws way too many triangles… we need to go from O(n) to O(log n). • Several types of LOD available: • Dynamic (view-dependent): FORGET IT • Static mesh switching (simple) • Progressive mesh (best algorithm: VIPM)

View Independent Progressive Mesh • Collapse vertices due to base-plane error metric. • Generate one sequence of collapses that takes us from high-res to low-res. • Popping in VIPM is subtle, which is good. • VIPM draws fewer triangles than static switching, since we usually push static switching away in Z to avoid popping.

Problem with VIPM • VIPM slides a window across the index buffer, doing fix-ups. • Need to sort vertices by LOD collapse order • This conflicts with strip / cache sorting • You can’t do all three at once (though you can do stripped VIPM or cache-sorted VIPM) index buffer fix-up record

Sorting Score Card(more items will be added here) • Triangle strip efficiency order • Vertex cache order • LOD collapse order (if VIPM)

Normal Map Generation • Approximate huge amounts of geometry by per-texel normals • Generate the maps by crunching a high-res mesh down onto a low-res one… • When rendering, transform texture normal by iterated tangent frame, and you get the normal of the high-res model (almost) • Object or tangent space?

Normal Map Generationinfluences LOD choice • With static switching, you just have an array of meshes • With VIPM, you are forced to use object-space normal maps, which probably don’t compress as well as tangent-space maps. • Normal mapping to a high-res model makes static mesh switching look better (much less popping… most popping was due to light)

More Sorting • To render quickly, we want to sort by render state (multiple materials on the same object means we break that object into several passes, decreasing triangle strip and vertex cache effectiveness) • To render quickly, we want to draw front-to-back (fast z-fail) • To render transparent things correctly, we need to draw those back-to-front (break these into a separate pass, decrease stripping and cache effectiveness) • We are robbing ourselves of the benefits we got earlier… so hopefully we didn’t pay very much for them (more on this later)

Sorting Score Card • Triangle strip efficiency order • Vertex cache order • LOD collapse order (if PM) • Sort by shader • Front-to-back (opaque things) • Back-to-front (translucent things)

How do you LOD a guy with multiple materials? • Materials usually done by one pixel / vertex shader pair, per material • Can only combine triangles so much (can’t cross material boundary) • Can’t combine textures into one (lose lighting effects) • Everybody just kind of punts… this is an important problem to solve for the future.

Part 3:Making Games

Trade-Offs • As computer scientists and engineers we are accustomed to the idea of engineering trade-offs (time for space, etc) • Must consider code complexity to be a FINITE RESOURCE that can be traded with time, space, etc.

Complexity as Resource • Every extra line of code or ‘if’ statement must be maintained through the life of the project and must interact with new features • IMPORTANT: most new features are not orthogonal; they will FIGHT with your existing code. • You only have so much complexity to spend over the course of your project; too much and your project will fail.

Cultural Problem • At least in America, many programmers try to prove themselves by doing complicated, impressive-sounding things. • Try to make 3D engine that is the “next big cool thing” • The successful paths of the past have been things that are NOT complicated (triangles are simple!) • Successful paths of the future will probably also be the simpler ones. So…

A Thought • If your engine / algorithms seem very complicated…. • …. they are unlikely to be on a path that history will make successful • They will NOT be the next big thing

Good Art and Levelsare more importantthan a good engine • If you are adding engine features that make it more difficult to create levels / content (without making the content a lot richer), this is probably a mistake Max Payne

Cost-Benefit Analysis • Don’t forget to account for opportunity cost … every minute you spend working on A is a minute not working on B • You need to be an economist, deciding how to get the most net worth out of the resources you have to spend. • YOU need to do it, not just the managers • It is a multiscale (fractal) phenomenon

Making the Pieces Fit Together

Making the Pieces Fit Together

Presentation Transcript

Putting the pieces together

Putting the Pieces Together

Putting the Pieces together

Leadership: Making the Pieces Fit

Putting the Pieces Together

Putting the Pieces Together

Missing pieces + Putting the pieces together

Making The Pieces Fit With SBAC

Fitting The Pieces Together

Putting the Pieces Together:

Putting the Pieces Together

Putting The Pieces Together

Fitting the Pieces Together

Putting the pieces together

Putting the Pieces Together

PUTTING THE PIECES TOGETHER

Putting the Pieces Together

504 Training – How the Pieces Fit Together

Putting the Pieces Together

Fitting the Pieces Together