Efficient Massively Parallel LDPC Decoding on Multicore Architectures
This presentation discusses the implementation of Low-Density Parity-Check (LDPC) decoding algorithms on various multicore architectures, including x86 processors, GPUs, and the CELL Broadband Engine. It introduces the belief propagation method, data structures, and parallel computing models necessary for maximizing throughput. The execution of computing kernels is parallelized using OpenMP and CUDA to enhance performance. Experimental results will be shared, demonstrating the effectiveness of these approaches across different platforms to improve decoding efficiency in communication systems.
Efficient Massively Parallel LDPC Decoding on Multicore Architectures
E N D
Presentation Transcript
Massively LDPC Decoding onMulticore Architectures Present by : fakewen
Authors • Gabriel Falcao • Leonel Sousa • Vitor Silva
Outline • Introduction • BELIEF PROPAGATION • DATA STRUCTURES AND PARALLEL COMPUTING MODELS • PARALLELIZING THE KERNELS EXECUTION • EXPERIMENTAL RESULTS
Outline • Introduction • BELIEF PROPAGATION • DATA STRUCTURES AND PARALLEL COMPUTING MODELS • PARALLELIZING THE KERNELS EXECUTION • EXPERIMENTAL RESULTS
Introduction • LDPC decoding on multicore architectures • LDPC decoders were developed on recent multicores, such as off-the-shelf general-purpose x86 processors, Graphics Processing Units (GPUs), and the CELL Broadband Engine (CELL/B.E.).
Outline • Introduction • BELIEF PROPAGATION • DATA STRUCTURES AND PARALLEL COMPUTING MODELS • PARALLELIZING THE KERNELS EXECUTION • EXPERIMENTAL RESULTS
BELIEF PROPAGATION • Belief propagation, also known as the SPA, is an iterative algorithm for the computation of joint probabilities
LDPC Decoding • exploit probabilistic relationships between nodes imposed by parity-check conditions that allow inferring the most likely transmitted codeword.
LDPC Decoding(cont.) White Gaussian noise
Outline • Introduction • BELIEF PROPAGATION • DATA STRUCTURES AND PARALLEL COMPUTING MODELS • PARALLELIZING THE KERNELS EXECUTION • EXPERIMENTAL RESULTS
DATA STRUCTURES AND PARALLEL COMPUTING MODELS • compact data structures to representthe H matrix
Data Structures • separately code the information about H in two independentdata streams, and
remind • rmn :是CNm->BNn • qnm :是BNn->CNm
Parallel Computational Models • Parallel Features of the General-PurposeMulticores • Parallel Features of the GPU • Parallel Features of the CELL/B.E.
Parallel Features of the General-PurposeMulticores • #pragma omp parallel for
Outline • Introduction • BELIEF PROPAGATION • DATA STRUCTURES AND PARALLEL COMPUTING MODELS • PARALLELIZING THE KERNELS EXECUTION • EXPERIMENTAL RESULTS
PARALLELIZING THE KERNELS EXECUTION • The Multicores Using OpenMP • The GPU Using CUDA • The CELL/B.E.
The GPU Using CUDA • Programming the Grid Using a Thread per Node Approach
The GPU Using CUDA(cont.) • Coalesced Memory Accesses
The CELL/B.E. • Small Single-SPE Model(AB C) • Large Single-SPE Model
Why Single-SPE Model • In the single-SPE model, the number of communications between PPE and SPEs is minimum and the PPE is relieved from the costly task of reorganizing data (sorting procedure in Algorithm 4) between data transfers to the SPE.
Outline • Introduction • BELIEF PROPAGATION • DATA STRUCTURES AND PARALLEL COMPUTING MODELS • PARALLELIZING THE KERNELS EXECUTION • EXPERIMENTAL RESULTS
EXPERIMENTAL RESULTS • LDPC Decoding on the General-Purpose x86 Multicores Using OpenMP • LDPC Decoding on the CELL/B.E. • Small Single-SPE Model • Large Single-SPE Model • LDPC Decoding on the GPU Using CUDA
LDPC Decoding on the General-Purpose x86 Multicores Using OpenMP
The end Thank you~