FAST MAP PROJECTION ON CUDA

FAST MAP PROJECTION ON CUDA Yanwei Zhao Institute of Computing Technology Chinese Academy of Sciences July 29, 2011

Outline Institute of Computing Technology, Chinese Academy of Sciences

Map Projection • Establish the relationship between two different coordinate systems. • geographical coordinates → planar cartesian map space coordinate system • Complicated and time consuming arithmetic operations. • Fast answer with desired accuracy→ Slow exact answer • It's need to be accelerated for interactive GIS scenarios. Institute of Computing Technology, Chinese Academy of Sciences

GPGPU(The general purpose computing on graphics processing units) • GPGPU is a young area of research. • Advantage of GPU • Flexibility • Power processing • Low cost • GPGPU in applications other than 3D graphics • GPU accelerates critical path of application Institute of Computing Technology, Chinese Academy of Sciences

CUDA(Common Unified Device Architecture) • NVIDIA's parallel computing architecture • C base programming language and development toolkit • Advantage: • Programmer can focus on the important issues rather than an unfamiliar language • No need of graphics APIs and write efficient parallel code Institute of Computing Technology, Chinese Academy of Sciences

The characteristic of Map Projection • Huge amount of coordinates to handle • The complexity of arithmetic operations • The requirement of a realtime response Institute of Computing Technology, Chinese Academy of Sciences

Our proposals • using the new technology CUDA on the GPU • Take Universal Transverse Mercator (UTM) projection as an example • Performance: • Improvement of up to 6x to 8x • (include transfer time) • Speed up 70x to 90x • (not include transfer time) Institute of Computing Technology, Chinese Academy of Sciences

Algorithm framework Striped partitioning Matrix distribution Institute of Computing Technology, Chinese Academy of Sciences

Striped partitioning • Define the number of block and thread: • Block_num,Thread_num • CUDA built-in parameters: • GridDim, BlockDim • Geographic feature number: • fn • Each block runs features: • fn/GridDim.x Institute of Computing Technology, Chinese Academy of Sciences

Striped partitioning • For surrounding loop: • Blocks and features • Block → Feature[i] • i = blockidx.x*(fn/GridDim.x) (1) • Block → next Feature[k] • k = i + fn/GridDim.x (2) • For inner loop: • Threads and coordinates • thread→coord[j] • j = threadIdx.x • thread→next coord[k] • k = j +Thread_num Institute of Computing Technology, Chinese Academy of Sciences

Striped partitioning • For surrounding loop: • Blocks and features • Block → Feature[i] • i = blockidx.x*(fn/GridDim.x) • Block → next Feature[k] • k = i + fn/GridDim.x • For inner loop: • Threads and coordinates • thread→coord[j] • j = threadIdx.x (1) • thread→next coord[k] • k = j +Thread_num (2) Institute of Computing Technology, Chinese Academy of Sciences

Matrix distribution • Define the number of block and thread: • grid(br,bc), block(tr,tc) • Each block run k features, where: • (1) • Feature[i]: • (2) • (3) Institute of Computing Technology, Chinese Academy of Sciences

Matrix distribution • Each block run s coordnates, where: (1) • coord[j]: Institute of Computing Technology, Chinese Academy of Sciences

Experiment Environment • Hardware: • CPU: Intel Core2 Duo CPU E8500 at 3.18GHz with 2GB of internal memory • GPU: NVIDIA GeForce 9800 GTX+ graphics card which has 512MB memory, 128 CUDA cores and 16 multiprocessors • Software: • Microsoft Windows XP Pro SP2 • Microsoft Visual Studio 2005 • NVIDIA driver 2.2, CUDA sdk 2.2 and CUDA toolkit 2.2 Institute of Computing Technology, Chinese Academy of Sciences

The data parallel degree • total CPU time : • initialization and file reading time • serial projection time Institute of Computing Technology, Chinese Academy of Sciences

The data parallel degree • total CPU time : • initialization and file reading time • serial projection time Map projection can achieve more than 90 percent of parallelism. Institute of Computing Technology, Chinese Academy of Sciences

Comparing with CPU • Block_num=64 Thread_num=512 Institute of Computing Technology, Chinese Academy of Sciences

Comparing with CPU • Total time = map projection time + data transfer time Institute of Computing Technology, Chinese Academy of Sciences

Comparing with CPU • If consider the total time, the performance can obtain 6x to 8x. Institute of Computing Technology, Chinese Academy of Sciences

Comparing with CPU • If only compare map projection time, we can obtain 70x to 90x speedups. Institute of Computing Technology, Chinese Academy of Sciences

The performance of different task assignments • striped partitioning : • Block_num=64, Thread_num=512 • matrix distribution: • dim_grid(32,32) = 32*32 blocks • dim_block(256,256) = 256*256 threads Institute of Computing Technology, Chinese Academy of Sciences

The performance of different task assignments • striped partitioning : • Block_num=64, Thread_num=512 • matrix distribution: • dim_grid(32,32) = 32*32 blocks • dim_block(256,256) = 256*256 threads Striped: 6x to 8x Matrix: 4x to 6x Institute of Computing Technology, Chinese Academy of Sciences

The performance of different task assignments Matrix Striped Institute of Computing Technology, Chinese Academy of Sciences

The performance of different task assignments Matrix Striped All threads in the block accessing consecutive memory. it can only ensure each row of threads in the block handle consecutive data Institute of Computing Technology, Chinese Academy of Sciences

Conclusion and Future work • Implement a fast map projection method. • CUDA-enabled GPUs • high speed-up compared to the CPU-based method • the power of modern GPU is able to considerably speed up in the field of geoscience • DEM-based spatial interpolation • raster-based spatial analysis • Future work: • GPU implementation of other GIS application Institute of Computing Technology, Chinese Academy of Sciences

Thank you!Q & A Yanwei Zhao Institute of Computing Technology Contact: zhaoyanwei@ict.ac.cn Institute of Computing Technology, Chinese Academy of Sciences

FAST MAP PROJECTION ON CUDA