Efficient Massively Parallel LDPC Decoding on Multicore Architectures

Massively LDPC Decoding onMulticore Architectures Present by : fakewen

Authors • Gabriel Falcao • Leonel Sousa • Vitor Silva

Outline • Introduction • BELIEF PROPAGATION • DATA STRUCTURES AND PARALLEL COMPUTING MODELS • PARALLELIZING THE KERNELS EXECUTION • EXPERIMENTAL RESULTS

Introduction • LDPC decoding on multicore architectures • LDPC decoders were developed on recent multicores, such as off-the-shelf general-purpose x86 processors, Graphics Processing Units (GPUs), and the CELL Broadband Engine (CELL/B.E.).

BELIEF PROPAGATION • Belief propagation, also known as the SPA, is an iterative algorithm for the computation of joint probabilities

LDPC Decoding • exploit probabilistic relationships between nodes imposed by parity-check conditions that allow inferring the most likely transmitted codeword.

LDPC Decoding(cont.) White Gaussian noise

LDPC Decoding(cont.)

DATA STRUCTURES AND PARALLEL COMPUTING MODELS • compact data structures to representthe H matrix

Data Structures • separately code the information about H in two independentdata streams, and

remind • rmn :是CNm->BNn • qnm :是BNn->CNm

Parallel Computational Models • Parallel Features of the General-PurposeMulticores • Parallel Features of the GPU • Parallel Features of the CELL/B.E.

Parallel Features of the General-PurposeMulticores • #pragma omp parallel for

Parallel Features of the GPU

Throughput

Parallel Features of the CELL/B.E.

Throughput

PARALLELIZING THE KERNELS EXECUTION • The Multicores Using OpenMP • The GPU Using CUDA • The CELL/B.E.

The Multicores Using OpenMP

The GPU Using CUDA • Programming the Grid Using a Thread per Node Approach

The GPU Using CUDA(cont.) • Coalesced Memory Accesses

The CELL/B.E. • Small Single-SPE Model(AB C) • Large Single-SPE Model

Why Single-SPE Model • In the single-SPE model, the number of communications between PPE and SPEs is minimum and the PPE is relieved from the costly task of reorganizing data (sorting procedure in Algorithm 4) between data transfers to the SPE.

EXPERIMENTAL RESULTS • LDPC Decoding on the General-Purpose x86 Multicores Using OpenMP • LDPC Decoding on the CELL/B.E. • Small Single-SPE Model • Large Single-SPE Model • LDPC Decoding on the GPU Using CUDA

LDPC Decoding on the General-Purpose x86 Multicores Using OpenMP

LDPC Decoding on the CELL/B.E.

LDPC Decoding on the CELL/B.E.(cont.)

LDPC Decoding on the GPU Using CUDA

The end Thank you~

Efficient Massively Parallel LDPC Decoding on Multicore Architectures

Efficient Massively Parallel LDPC Decoding on Multicore Architectures

Presentation Transcript

An Execution Model for Heterogeneous Multicore Architectures

Massively Parallel LDPC Decoding on GPU

Parallel Execution Models for Future Multicore Architectures

LDPC Decoding: VLSI Architectures and Implementations

Survey of multicore architectures

Billion Transistor Chips Multicore Low Power Architectures

Improving BER Performance of LDPC Codes Based on Intermediate Decoding Results

Parallel H.264 Decoding on an Embedded Multicore Processor

Billion Transistor Chips Multicore Low Power Architectures

Semi-Parallel Reconfigurable Architecture for Real-time LDPC decoding

An Improved Split-Row Threshold Decoding Algorithm for LDPC Codes

Error Correction and LDPC decoding

Priority Project Performance On Massively Parallel Architectures (POMPA) Nice to meet you!

Operating System Support for Fine-Grain Parallelism on Multicore Architectures

A Scalable Architecture for LDPC Decoding

Parallel Skyline Computation on Multicore Architectures

Multi-Split-Row Threshold Decoding Implementations for LDPC Codes

Error Correction and LDPC decoding

Software Enablement for Multicore Architectures