1 / 37

Performance Tools

Performance Tools. Jeff Kiel Manager, Developer Performance Tools. Performance Tools Agenda. Overview of GPU pipeline and Unified Shader NVIDIA PerfKit 5.0: Driver & GPU Performance Data Instrumented Driver: GPU & driver performance information, GLExpert runtime debugging

kipling
Télécharger la présentation

Performance Tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Tools Jeff Kiel Manager, Developer Performance Tools

  2. Performance Tools Agenda • Overview of GPU pipeline and Unified Shader • NVIDIA PerfKit 5.0: Driver & GPU Performance Data • Instrumented Driver: GPU & driver performance information, GLExpert runtime debugging • PerfSDK: Performance data integrated into your application • PerfHUD: The Direct3D GPU Performance Accelerator • gDEBugger: OpenGL performance analysis and debugging • ShaderPerf: Shader program performance

  3. Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Geometry Shader Pixel Shader GPU Pipelined Architecture (Logical View) GPU Rasterizer CPU Blending Vertex Assembly Texture Framebuffer

  4. Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Geometry Shader Pixel Shader GPU Pipelined Architecture (Logical View) GPU Rasterizer CPU Blending Vertex Assembly Texture Framebuffer

  5. Common Graphics/GPU Problems • New, increasingly complex GPU hardware • GPU is a black box • Unified shaders changes everything • Increasing engine and scene complexity • Artists don’t always understand how rendering engines work • CPU tuning insufficient (multiple processors, multi-cores) • Turn around time for debugging and tuning shaders too long • Hard to debug API/pipeline setup issues

  6. Unified Shader Tuning • No longer “pixel shader bound” • GPU balances workload • Now just “shader unit bound” • Check workload distribution for optimization opportunities • Typical optimizations may not work • Classic: move calculations from pixels to vertices • If #vertices ~= #pixels, no improvement

  7. NVIDIA PerfKit 5: The Solution! • Instrumented Driver • GLExpert • PerfHUD • PerfSDK • PerfAPI • Sample Code • Helper Classes • Documentation • Tools • NVIDIA Plug-In for • Microsoft PIX for Windows • gDEBugger 3.1 • DevCPL • Platforms (x32 & x64) • Windows XP & Vista • Linux Update Soon!

  8. PerfKit Instrumented Driver • GLExpert functionality • GPU and Driver Performance Counters • OpenGL and Direct3D • Data exported via NVIDIA API and PDH • Simplified Experiments (SimExp) • Collect GPU and driver data, retain performance • Track intra-frame events & statistics • Gather and collate at end of frame • Performance overhead 1-2%

  9. GLExpert: What is it? • Helps eliminate driver/CPU performance issues • OpenGL portion of the Instrumented Driver • Output to stdout or debugger • Different groups/levels of information detail • Controlled using environment variables in Linux, DevCPL tab in Windows

  10. GLExpert: What is it? • Information provided • GL Errors: print when raised • Software Fallbacks: indicate when the driver is in fall back • GPU Programs: errors during compile or link • VBOs: show where they reside, mapping details • FBOs: print reasons for unsupported configuration • Future Enhancements • Extensive SLI support • Quadro 5600/GeForce 8 Series extensions • More detailed pipeline setup messages, buffer object support, fallback information, and more

  11. PerfKit: Performance Counter Types • SW/Driver Counters: PerfAPI, PDH • Raw GPU Counters: PerfAPI, PDH • Simplified Experiments: PerfAPI • Instrumented GPUs • GeForce 7800 GTX • GeForce 6800 Ultra & GT • GeForce 6600 • Quadro FX 5600 & 4500 • GeForce 8800 GTX, 8600 GT • GeForce 7950/7900 GTX & GT

  12. OpenGL/Direct3D Driver Counters • General • FPS • ms per frame • Driver • Driver frame time (total time spent in driver) • Driver sleep time (waiting for GPU) • Detailed wait timers (kernel, locks, rendering, etc.) • Counts • Batches, vertices, primitives • Triangles and instanced triangles • Memory • Total used • Render targets, textures, buffers

  13. Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Vertex Shader Pixel Shader Geometry Shader GPU Counters gpu_idle Vertex Assembly Frame Buffer (RAM Memory) vertex_attribute_count shader_busy vertex, geometry, pixel ratios Texture Unit (Filtering) culled_primitive_count primitive_count triangle_count vertex_count Raster / ZCull shaded_pixel_count Raster Operations rop_busy

  14. How do I use PerfKit counters? • PerfAPI: Easy integration of PerfKit • Real time performance monitoring using GPU and driver counters, round robin sampling • Simplified Experiments for single frame analysis • PDH: Performance Data Helper for Windows • Driver data, GPU counters, and OS information • Exposed via Perfmon • Good for rapid prototyping • PerfSDK: Sample code and helper classes

  15. PerfAPI: Real Time // Somewhere in setup NVPMAddCounterByName(“vertex_shader_busy”); NVPMAddCounterByName (“pixel_shader_busy”); NVPMAddCounterByName (“shader_waits_for_texture”); NVPMAddCounterByName (“gpu_idle”); // In your rendering loop, sample using names NVPMSample(NULL, &nNumSamples); NVPMGetCounterValueByName(“vertex_shader_busy”, 0, &nVSEvents, &nVSCycles); NVPMGetCounterValueByName(“pixel_shader_busy”, 0, &nPSEvents, &nPSCycles); NVPMGetCounterValueByName(“shader_waits_for_texture”, 0, &nTexEvents, &nTexCycles); NVPMGetCounterValueByName(“gpu_idle”, 0, &nIdleEvents, &nIdleCycles);

  16. PerfAPI: SimExp NVPMAddCounter(“GPU Bottleneck”); NVPMAllocObjects(50); // Set up the experiment, get pass count NVPMBeginExperiment(&nNumPasses); for(int ii = 0; ii < nNumPasses; ++ii) { // Scene setup/clear backbuffer NVPMBeginPass(ii); NVPMBeginObject(0); // Draw calls associated with object 0 and flush NVPMEndObject(0); ... NVPMEndPass(ii); // End scene/present/swap buffers } // End experiment and retrieve bottleneck NVPMEndExperiment(); NVPMGetCounterValueByName(“GPU Bottleneck”, 0, &nGPUBneck, &nGPUCycles);

  17. PerfHUD: Direct3D debugging and tuning • One click bottleneck determination • Graphs and debugging tools overlaid on your application • 4 screens for targeted analysis • Performance Dashboard • Debug Console • Frame Debugger • Frame Profiler • Drag and drop application on PerfHUD icon

  18. New! PerfHUD 5.0! • Interactive model • Shader Edit and Continue • Render state Modification • Configurable Graphs • Many more features and usability improvements • New technologies • Windows Vista & DirectX 10 • Quadro 5600 and GeForce 8800 with Unified Shader Architecture

  19. Demo: PerfHUD Company of Heroes used with permission from THQ and Relic Entertainment

  20. Demo: Performance Dashboard Company of Heroes used with permission from THQ and Relic Entertainment

  21. Demo: Performance Dashboard Company of Heroes used with permission from THQ and Relic Entertainment

  22. Demo: Performance Dashboard Company of Heroes used with permission from THQ and Relic Entertainment

  23. Demo: Frame Debugger Company of Heroes used with permission from THQ and Relic Entertainment

  24. Demo: Advanced Frame Debugger

  25. Demo: Frame Profiler Company of Heroes used with permission from THQ and Relic Entertainment

  26. Frame Profiler One Touch Performance Analysis • PerfHUD uses PerfSDK • Multiple passes on the scene, sample over 40 performance counters • Need to render THE SAME FRAME until all the counters are read • Must use time-based animation • Do use QueryPerformanceCounter() or timeGetTime() • Don’t use RDTSC or throttle frame rate

  27. Associated Tools: NVIDIA Plug-In for Microsoft PIX for Windows

  28. Graphic Remedy’s gDEBugger OpenGL and OpenGL ES Debugger and Profiler • Shorten development time • Improve application quality • Optimize performance • NVIDIA PerfKit and GLExpert integrated • Now supports Linux! • Windows XP & Vista, x32 & x64 • Discounted academic licenses available • NVIDIA booth Thursday morning http://www.gremedy.com

  29. PerfGraph • Open source tool for graphing performance counters • Supports PerfKit GPU/Driver signals • System performance information (CPU utilization, memory, etc.) • Cross platform Windows & Linux

  30. Project Status • PerfKit 5.0 available now: http://developer.nvidia.com/perfkit • PerfHUD 5.0 • ForceWare Release 100 Driver • GeForce 8800 support • Windows XP & Vista, 32 and 64 bit • Linux 32 and 64 bit PerfSDK/GLExpert in development • PerfGraph: www.sourceforge.org\perfgraph • Instrumented GPUs • GeForce 7800 GTX • GeForce 6800 Ultra & GT • GeForce 6600 • Quadro FX 5600 & 4500 • GeForce 8800 Series • GeForce 7950/7900 GTX & GT Feedback and Support: http://developer.nvidia.com/forums

  31. v2f BumpReflectVS(a2v IN, uniform float4x4 WorldViewProj, uniform float4x4 World, uniform float4x4 ViewIT) { v2f OUT; // Position in screen space. OUT.Position = mul(IN.Position, WorldViewProj); // pass texture coordinates for fetching the normal map OUT.TexCoord.xyz = IN.TexCoord; OUT.TexCoord.w = 1.0; // compute the 4x4 tranform from tangent space to object space float3x3 TangentToObjSpace; // first rows are the tangent and binormal scaled by the bump scale TangentToObjSpace[0] = float3(IN.Tangent.x, IN.Binormal.x, IN.Normal.x); TangentToObjSpace[1] = float3(IN.Tangent.y, IN.Binormal.y, IN.Normal.y); TangentToObjSpace[2] = float3(IN.Tangent.z, IN.Binormal.z, IN.Normal.z); OUT.TexCoord1.x = dot(World[0].xyz, TangentToObjSpace[0]); OUT.TexCoord1.y = dot(World[1].xyz, TangentToObjSpace[0]); OUT.TexCoord1.z = dot(World[2].xyz, TangentToObjSpace[0]); OUT.TexCoord2.x = dot(World[0].xyz, TangentToObjSpace[1]); OUT.TexCoord2.y = dot(World[1].xyz, TangentToObjSpace[1]); OUT.TexCoord2.z = dot(World[2].xyz, TangentToObjSpace[1]); OUT.TexCoord3.x = dot(World[0].xyz, TangentToObjSpace[2]); OUT.TexCoord3.y = dot(World[1].xyz, TangentToObjSpace[2]); OUT.TexCoord3.z = dot(World[2].xyz, TangentToObjSpace[2]); float4 worldPos = mul(IN.Position, World); // compute the eye vector (going from shaded point to eye) in cube space float4 eyeVector = worldPos - ViewIT[3]; // view inv. transpose contains eye position in world space in last row. OUT.TexCoord1.w = eyeVector.x; OUT.TexCoord2.w = eyeVector.y; OUT.TexCoord3.w = eyeVector.z; return OUT; } ///////////////// pixel shader ////////////////// float4 BumpReflectPS(v2f IN, uniform sampler2D NormalMap, uniform samplerCUBE EnvironmentMap, uniform float BumpScale) : COLOR { // fetch the bump normal from the normal map float3 normal = tex2D(NormalMap, IN.TexCoord.xy).xyz * 2.0 - 1.0; normal = normalize(float3(normal.x * BumpScale, normal.y * BumpScale, normal.z)); // transform the bump normal into cube space // then use the transformed normal and eye vector to compute a reflection vector // used to fetch the cube map // (we multiply by 2 only to increase brightness) float3 eyevec = float3(IN.TexCoord1.w, IN.TexCoord2.w, IN.TexCoord3.w); float3 worldNorm; worldNorm.x = dot(IN.TexCoord1.xyz,normal); worldNorm.y = dot(IN.TexCoord2.xyz,normal); worldNorm.z = dot(IN.TexCoord3.xyz,normal); float3 lookup = reflect(eyevec, worldNorm); return texCUBE(EnvironmentMap, lookup); } ShaderPerf 2.0 • Inputs: • GLSL, Cg, HLSL • PS1.x,PS2.x,PS3.x • VS1.x,VS2.x, VS3.x • !!FP1.0 • !!ARBfp1.0 • Outputs: • Resulting assembly code • # of cycles • # of temporary registers • Pixel/vertex throughput • Test all fp16 and all fp32 ShaderPerf • GPU Arch: • Quadro FX series • GeForce 8X00, 7X00 • GeForce 6X00 & FX

  32. ShaderPerf: In your pipeline • Test current performance • Compare with shader cycle budgets • Test optimization opportunities • Not just Tex/ALU balance: cycles & throughput • Automated regression analysis • Integrated in FX Composer 2.0 • Artists/TDs code expensive shaders • Achieve optimum performance

  33. ShaderPerf 2.0 Alpha • Supports Direct3D/HLSL • GeForce 7, 6, and FX series GPUs • ForceWare Release 162 Unified Compiler • Improved vertex performance simulation and throughput calculation with branching • Multiple drivers from one ShaderPerf • Smaller footprint • New programmatic interface

  34. ShaderPerf 2.0: Beta • Full support for Cg and GLSL, vertex and fragment programs • Support for Quadro 5600 & GeForce 8 series GPUs • Geometry shaders and geometry throughput • Fragment program differencing

  35. Questions? • Stop by our booth for a hands on demo! • Online: http://developer.nvidia.com/PerfKit http://developer.nvidia.com/PerfHUD http://developer.nvidia.com/ShaderPerf Feedback and Support: http://developer.nvidia.com/forums

  36. The NVIDIA Developer Toolkit Content Creation Software Development Performance Documentation PerfKit 5 FX Composer 2 SDK 10 Conference Presentations mental mill Artist Edition PerfHUD 5 Cg Toolkit GPU Programming Guide PerfSDK Texture Tools 2 Videos NVSG GLExpert NV PIX Plug-in Melody Books gDEBugger ShaderPerf 2

  37. GPU Gems 3 Available Now! • SIGGRAPH Bookstore • Major Book Retailers • Includes chapters from • Adobe Systems • Apple • Crytek • Cornell University • Electronic Arts • Havok • Juniper Networks • Microsoft • SEGA • …and many more

More Related