630 likes | 805 Vues
GPU Acceleration in Registration. Danny Ruijters 26 April 2007. Outline. The GPU Rigid 3D-3D Registration Elastic Registration Conclusions. The GPU. The graphics card. Raserization of primitives Texture mapping Colour interpolation. The GPU. Graphics Processing Unit
E N D
GPU Acceleration in Registration Danny Ruijters 26 April 2007
Outline • The GPU • Rigid 3D-3D Registration • Elastic Registration • Conclusions
The graphics card • Raserization of primitives • Texture mapping • Colour interpolation
The GPU • Graphics Processing Unit • Programmable processor in the graphics rendering pipeline • Parallel execution (SIMD like)
video memory on-chip cache memory vertex shading (T&L) pre-TnL cache geometry system memory commands post-TnL cache CPU triangle setup rasterization texture cache fragment shading and raster operations textures frame buffer Graphics rendering pipeline
video memory on-chip cache memory transfer limited vertex shading (T&L) pre-TnL cache transform limited geometry system memory commands post-TnL cache setup limited CPU triangle setup texture limited raster limited rasterization CPU limited fragment shader limited fragment shading and raster operations texture cache textures frame buffer frame buffer limited Bottlenecks
128 processing units Local cache Shared memory
Performance • Parallelism & pipelining (up to 16 parallel pipelines) • Vector processor • Moore’s Law: CPU: 2* performance per 18 months • GPU: 2* performance per 6 months
Textures & buffers • 1D, 2D, 3D textures • 2D output buffers (frame buffer, accumulation buffer, stencil buffer, p-buffer) • 8, 10, 12, 16 bit integers, 16, 32 bit floating point • 1 (intensity), 2 (luminance-alpha), 3 (RGB), 4 (RGBA) components per pixel
Historic overview GPU • RenderMan (1988, pre-history) • Intel MMX (SIMD, 1997, pre-history) • Register combiners (nVidia, 1999, bronze age) • Vender specific APIs (2001, iron age) • Generic assembly-like language (2002, middle-ages) • Different high-level languages (2003, industrial age) • CUDA: general purpose C-like language (2007, modern age)
Register combiners (1999, bronse age) // Stage 0 // spare0.rgb = gradient dot ViewDir, spare1.rgb = -(gradient dot ViewDir) glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_A_NV,GL_TEXTURE0_ARB,GL_EXPAND_NORMAL_NV,GL_RGB); glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_B_NV,GL_CONSTANT_COLOR1_NV,GL_EXPAND_NORMAL_NV,GL_RGB); glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_C_NV,GL_TEXTURE0_ARB,GL_EXPAND_NEGATE_NV,GL_RGB); glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_D_NV,GL_CONSTANT_COLOR1_NV,GL_EXPAND_NORMAL_NV,GL_RGB); glCombinerOutputNV(GL_COMBINER0_NV,GL_RGB,GL_SPARE0_NV,GL_SPARE1_NV,GL_DISCARD_NV,GL_NONE,GL_NONE,GL_TRUE,GL_TRUE,GL_FALSE);
GL_ARB_fragment_program (2002) !!ARBfp1.0 ATTRIB coord = fragment.texcoord[0]; ATTRIB color = fragment.color; OUTPUT out = result.color; TEMP texel; TEMP lookup; TEX texel, coord, texture[0], 3D; TEX lookup, texel, texture[1], 1D; MUL out, lookup, color; END
GLSlang (2003) uniform vec3 ViewDir; void main (void) { floatvalue; vec3 gradient; gradient = texture3(0, gl_TexCoord0) * 2.0 - 1.0; value = 1.0 - abs(dot(gradient, ViewDir)); value *= 1.3 * dot(gradient, gradient); value = clamp(value, 0.0, 1.0); gl_FragColor = vec4(value); }
CUDA (2007) • Compute Unified Device Architecture • General purpose C-like language • nVidia only • Very recently released
3DRA – XperCT Registration 1 Pre-operative
3DRA – XperCT Registration 2 Post-operative: verification of the embolization
Mutual information F. Maes et al., "Multimodality Image Registration by Maximization of Mutual Information,“ IEEE Transactions on Medical Imaging 16(2), pp. 187-198, April 1997
Resampling Joint histogram: increment(g,g)
Elastic deformation • Parameterized deformation: • B-spline deformation:
GPU linear interpolation • Hardwired: linear interpolation is much faster than separate lookups
= GPU Cubic Interpolation • Compose cubic interpolation from weighted sum of linear interpolations: C. Sigg, M. Hadwiger, “Fast Third-Order Texture Filtering”, GPU Gems 2
= Outline of proof
GPU Cubic Interpolation • 2D: 4 linear-interpolated lookups, instead of 16 direct lookups • 3D: 8 linear-interpolated lookups, instead of 64 direct lookups
Optimization • Many parameters: huge parameter space • Solution: use derivatives like Jacobian, Hessian • Examples: Gradient Descent, Quasi-Newton, Levenberg-Marquardt
GPU Elastic Registration Iteration • Generate deformed image on GPU & store to texture • Calculate Similarity Measure & First-Order Derivative on GPU • Texture with reference image • Texture with deformed image
First-Order Derivative of Sim. Measure J. Kybic, M. Unser, “Fast Parametric Elastic Image Registration”
Derivative of the Deformed Image • Sobel operator to calculate gradients:
Derivative of the Control Points • Constant • B-spline: separatable kernel of fixed size
GPU Elastic Registration • 40 images: Quasi Newton: 16 seconds • Gradient Descent: 63 seconds • 8 * 8 Control Points: rest motion • Multi-resolution deformation field, with reduced parameters (discussed with Dirk Loeckx)