Assets, Dynamics and Behavior Computation for virtual worlds and computer games Sheldon Brown, UCSD Site Director CHMPR Daniel Tracy, Programmer, UCSD Experimental Game Lab Kristen Kho, Programmer, UCSD Experimental Game Lab Todd Margolis, CRCA Technical Director
Media environments becoming less singularly “authored”, more responsive to conditions of operation and social interaction. • Higher fidelity of HCI environments increases need for quality in underlying assets • Media environments are social spaces, and are less predictable as to how they are used • Repositories of rich data require these developments to enhance understanding. This applies across almost all fields of knowledge building and discovery. • We develop our methods of expression within the domains of culture. The invention of new forms of culture in areas such as computer gaming, will impact how we conduct science, engineering and social interactions of all types.
Interactive media is a good candidate for taking advantage of multicore computing. GPU’s are an example of one area in which multicore approaches have reaped enormous benefits. Other problems in the field aren’t as homogenous as graphics and require new approaches in the overall application structure to utilize processing power, while still delivering interactive operations. Whatever cool things we want to do all have to be done within 1/30 of a second. The creation of assets from databases and algorithms along with the behaviors of complex systems are currently limiting the possibilities of interactive graphic experiences.
Procedural Asset pipeline for virtual worlds, games and other forms of digital media. Data, either taken from real world sources via sensors, mined from databases, or generated, is transformed by varieties of algorthmic stages.
Asset creation becomes increasingly responsive to the application. Techniques developed from other fields have applicability to transforming data. For instance, 2D computer vision techniques apply to 3D spatial data.
Kristen Kho Programmer, UCSD Experimental Game Lab
Case Implementation: SC Road Generation Algorithms Using the Cell Processor Performance Conclusion and Future Work
Assets can also come from the purely algorithmic, with no initial data seed.
SC Road Generation • L-system of Archimedes spirals • Algorithm • Choose starting point on existing road • Generate a spiral curve (“challenger” road) • Test for intersections with existing roads • If no intersections, add to list of existing roads • Repeat n times or until all starting points have been tried
Motivation • Problem • Initial implementation used Maya/MEL • Very slow preprocess only • Take advantage of data parallelism and Cell processors • Generate road system in real time during SC runtime
Cell Processor Review • Power Processing Unit (PPU) • Synergistic Processing Unit (SPU) • Direct Memory Access (DMA)
Porting to the Cell • Road intersection testing • Function-offload programming model • Code • PPU: manages the SPU threads, sums up results • SPUs: intersection testing • CAFÉ (Cell Architecture Framework and Extensions) : set of libraries that build upon Cell SDK libraries
PPU Loop: Generate challenger curve Send challenger to SPUs Receive SPU intersection counts If there were no intersections Insert challenger into list Repeat SPU Loop: Receive challenger curve Update sublist of existing curves intersectionCount = 0 For every curve in sublist: If challenger intersects curve Increment intersectionCount Send intersectionCount to PPU Pseudocode
Line Segment Intersection • Using a parametric representation, the line segment ab can be written as a convex combination involving a real parameter s: p(s) = (1 - s)a + sb for 0 ≤ s ≤ 1 • Similarly for cd we may introduce a parameter t: q(t) = (1 - t)c + td for 0 ≤ t ≤ 1 • An intersection occurs if and only if we can find s and t in the desired ranges such that p(s) = q(t). Thus we get the two equations: (1 - s)ax + s bx = (1 - t) cx + t dx (1 - s)ay + s by = (1 - t) cy + t dy • The coordinates of the points are all known, so it is just a simple exercise in linear algebra to solve for s and t.
SC/Cell Communication • Scalable City (Win32) and a Cell Blade communicate on the local network via lightweight TCP/IP server/client programs • Request for new road system is initiated by SC (the client) • Cell (the server) sends back the road data after work has completed
Client (Win32) Connect to Server Send parameters Send border curve Receive road hierarchy Receive road image Disconnect from Server Server (Cell) Open Client connection Receive parameters Receive border curve Run road generation program Send road hierarchy Send road image Close Client connection Networking Protocol
Optimizations • Floating point precision • 10x faster than double precision • Loss of precision requires normalization to prevent errors
Optimizations • Loop Unrolling + Vector Operations • Can test 4 intersections at a time • Bounding circles • Spirals fit roughly within a circle • Can skip tests if bounding circles intersect • NUMA & MPI • Slower due to communication latency
Performance Table 1. Total Execution Time for Road Generation on the Cell
Performance Table 2. Comparison of Final technique vs. various optimization approaches
Cell based computation of similar pipeline – takes place in about 1 second per blade per landscape.
The Cell processor has met our needs and expectations for real time and near real time asset development, allows roads to be brought into the program interactively Future Work • Offload other asset classes generation processes onto the multicore compute servers • Terrain generation from satellite images • House & tree placement/scattering • Dynamics, Physics, Animation, AI
Creating these assets outside the main application pipeline assures continued real time behavior, but how do we further integrate these operations? How do we come up with effective approaches for dynamically balancing multicore computing in computing assets dynamics and behavior? Where do we employ high level tools, low level algorithm re-engineering and the capabilities of heterogeneous computing devices?
Comparing multi-threading techniques with Intel Threading Building Blocks compared to Boost threads to serial processing in different areas of interactive graphics applications. We can see that all of the multi-threading solutions yield a significant improvement over the serial implementation. The TBB solutions yielded very similar results and the boost threading solution was slightly slower on average.
Areas of the application that have execution dependencies necessitate larger scale re-engineering to achieve multi-threading improvements. See report for more details Serial implementation is the quickest in this case. The automatic partitioning for TBB seriously degrades performance and the boost and regular TBB implementations are comparable in performance, with TBB yielding slightly better performance on average.
By putting physics on a separate thread, application performance gains range from 100% improvement when little physics activity, to no improvement with high levels of physics. Generally we experience about a 30% improvement. Bottleneck loops in non-thread safe physics library require locks for library calls, degrading performance. Re-design of the physics library with data level parallelism is required. See report for more details
All objects in the real world are potentially dynamic.But most are not active all the time.Or their activity is at orders of magnitude different scales. Planets to People to Peanuts to Proteins.This is a lot of stuff to keep track of. Most common approaches are to make special categories of objects that are subject to dynamic interaction in a virtual world.However, some new approaches can create a virtual world that is significantly more complex. See report for more details