LDPC FEC for IEEE 802.11n Applications

LDPC FEC forIEEE 802.11nApplications Eric Jacobsen (eric.a.jacobsen@intel.com) Intel Labs Communications Technology Laboratory November 10, 2003 Eric Jacobsen, Intel

Agenda • Background – why LDPCs? • Fitting LDPCs to WLAN • Details of candidate code • Performance and use of candidate code • Complexity analysis • Summary Eric Jacobsen, Intel

Candidate Iterative FECs • Turbo Codes (PCCC or SCCC) • High complexity • Poor performance with short blocks • IP Issues • Turbo Product Codes • Medium Complexity • Best performance at R ~= 0.8 • Poor performance with short blocks • Possible IP issues • Low Density Parity Check Codes (LDPCs) • Invented in 1962 – No basic IP! • Potential for low complexity – constituent codes are Parity Check relationships • Extremely good performance with long blocks (C-0.0045dB!) • Very good performance with short blocks (Lin) • Eliminate channel interleaver Eric Jacobsen, Intel

LDPC Codes solve several problems • Close the large gap between current and theoretical performance • Only known solution for good performance with small block sizes • Enable Adaptive Bit Loading by eliminating the channel bit interleaver • LDPCs incorporate the required randomization into the code – These are the only known codes that do this! • This also provides a significant complexity reduction • Offsets complexity of code • Decoupling the FEC and modulation increases flexibility Eric Jacobsen, Intel

Low Density Parity Check FEC • Iterative decoding of simple parity check codes • Published examples of good performance with short blocks • Kou, Lin, Fossorier, Trans IT, Nov. 2001 • Near-capacity performance with long blocks • Very near! - Chung, et al, “On the design of low-density parity-check codes within 0.0045dB of the Shannon limit”, IEEE Comm. Lett., Feb. 2001 • Complexity fears, especially in encoder • Implementation Challenges • Many options wrt decoding algorithms, architectures, techniques Eric Jacobsen, Intel

LDPC Bipartite (Tanner) Graph Check Nodes Edges Variable Nodes (Codeword bits) This is an example bipartite graph for an irregular LDPC code. Eric Jacobsen, Intel

BICM System with LDPC The nature of the LDPC calls into question whether the deinterleaver produces any benefit or just defines a different LDPC code. Receiver FFT Slicer De- Interleaver Demodulated Constellation Symbols Detected Coded Bits De-Interleaved Coded Bits Corrected Bits Eric Jacobsen, Intel

Direct Coding with LDPC Since the interleaver merely permutes the order of the rows of the parity check matrix, it can be deleted and its effects taken into account in the code design. Receiver FFT Slicer A system with LDPC FEC should provide superior performance with reasonable simplicity. Since the interleaver can be excluded the complexity drops further. Demodulated Constellation Symbols Detected Coded Bits Corrected Bits Eric Jacobsen, Intel

191-bit block results, Kou Capacity ~1.2dB for R = 0.69 Eric Jacobsen, Intel

Large Block LDPCs in Fading For large block sizes, In this case 105 and 106, LDPCs perform extremely close to capacity. For a code with R = ½ in AWGN, C = ~ 1.2 dB Eb/No (BICO). Eric Jacobsen, Intel

Candidate LDPC Code • (2000, 1600) code, R = 0.8 • Long enough for good performance, short enough to implement • BER in AWGN is <1.5dB from Capacity at Pe = 10-5 • Column weights are controlled by the code design • Four edges per information bit, two per parity bit • Last parity bit has one edge • 18 edges per check node (regular in H1) • Total of 7199 edges • Simplified Encoder • BCJR or Min-Sum decoding algorithm • Min-Sum costs 0.3dB in peformance, cuts gate count Eric Jacobsen, Intel

Performance in AWGN Capacity for R = 0.8 is 2.044dB, shown with a vertical dashed red line. At Pe = 10-5 the LDPC code is <1.5dB from Capacity. Eric Jacobsen, Intel

LDPC, ABL in fading These results are in Channel Model D, 50ns delay spread. The Viterbi-UBL results are essentially an 802.11a reference system. The LDPC-UBL results use a fixed code rate of R = 0.8. Eric Jacobsen, Intel

LDPC, ABL in fading These results are in Channel Model D, 50ns delay spread. The Viterbi-ABL results use puncturing and modulations BPSK, QPSK, 16-QAM and 64-QAM, with variable code rate. The LDPC-ABL results use puncturing, QPSK, 16-QAM, and 64-QAM, with a fixed code rate of R = 0.8. The throughput curve drops off at low SNR because BPSK is not part of the adaptation menu. Eric Jacobsen, Intel

Selected LDPC Code Use • Long packets are encoded by concatenating codewords • 1500 byte packet + overhead is ~8 codewords • Short packets are accommodated with code shortening • Parity stays constant, information field shortened • Short packets consume the minority of airtime, so code rate reduction carries little penalty • Increase in reliability for short packets comes at low cost Eric Jacobsen, Intel

Dartmouth Usage Statistics 1500 byte packets are the driving long packet type. Eric Jacobsen, Intel

400 bit parity 1600 bit data field Packet size accommodation 2000 bit codeword Long packets use concatenated codewords 400 bit parity 1600-N bit zero pad N bit data field Short blocks use shortened codewords. The zero pad is not transmitted. Eric Jacobsen, Intel

Comparative Performance(AWGN) Eric Jacobsen, Intel

LDPC Shortened Packet Performance vs Eb/No Shown are the effects of shortening the code from 1600 information bits to 800 and 400 bits (code rates of R = 2/3 and R = ½ , respectively. Performance for both 50 and 8 iterations are shown to verify performance for the shortened codes. Allowing the code rate to drop with packet size maintains power efficiency for short packets. Eric Jacobsen, Intel

LDPC Shortened Packet Performance vs SNR Shortened code Performance Is shown vs SNR. The gain from shortening the codes can be used to increase range if also applied to longer packets by concatenation. Eric Jacobsen, Intel

Iteration Management • LDPCs are iteratively decoded • The number of iterations affects the code performance • The number of iterations also affects the complexity Eric Jacobsen, Intel

Mother Code Iteration Study Viterbi, R = 0.8 (estimated) 11, 12 4 5 10 Viterbi, R = 3/4 50 6 9 7 8 1600-bit packets for all cases. Eric Jacobsen, Intel

Complexity Tradeoffs • Gate and memory complexity decrease with increasing clock rate • Serialization of processing allows gate and memory reuse • Gate complexity increases with number of iterations • Memory stays constant • BCJR more than 2x gate complexity over Min-Sum kernel • 0.3dB performance improvement • If memory complexity drives, then BCJR is a good option Eric Jacobsen, Intel

Latency Drives • For any block code for 802.11 the MAC latency requirements will drive • 1600 bits at 240 Mbps takes 6.6us to receive • SIFS budget drives, so for worst-case we assume a 1us budget allocated to the FEC block Eric Jacobsen, Intel

Analysis Assumptions • 240 Mbps target • Should encompass most modes • Eight iterations • Two processing clocks per information bit • Keep duty cycle low, reduces power consumption? • BCJR algorithm Eric Jacobsen, Intel

Complexity Estimates • Gates • 1us = 240 cycles at 240 MHz • Computation gates, BCJR ~= 124k gates • Additional control, sums, etc., ~40k gates • Estimated BCJR total gate count ~164k gates • Estimated Min-Sum total gate count ~98k gates • Memory • Scratchpad, computation, buffering ~= 120k bits • Code address ROM ~= 93.6k bits Eric Jacobsen, Intel

LDPC Decoder Area vs Latency BCJR reference case Shown is the estimated normalized die area, relative to a target reference, as a function of decoding latency. This takes into account only the reduction in gates by allowing the reuse of the maxx() hardware, and does not consider that the scratchpad memory size could also be reduced. Eric Jacobsen, Intel

Encoder Complexity The generic block encoder definition. A typical LDPC generator matrix, G, is high density for a low density parity check matrix H. By carefully partitioning G, the low density H matrix may be used and separated into two portions, H1 and H2, where H2 takes the low- density form shown. The inverse transpose of H2 can then be implemented as a differential encoder. Eric Jacobsen, Intel

Encoder Implementation The final encoder structure is as shown above. The data vector, u, is the systematic portion of the codeword, v. The parity bits, p, are generated from the low-density matrix H1 and the differential encoder 1/1+D. Eric Jacobsen, Intel

Complexity Summary • ~164k gates computation and control with BCJR • ~98k gates computation and control with Min-Sum • This is to achieve 1us decode time. Gate counts drop dramatically as latency is allowed to increase. • Memory estimate is 120k bits of RAM and 93.6k bits of control ROM • This is a conservative budgetary estimate. Other decoding algorithms or trick implementations may yield different results. Eric Jacobsen, Intel

Summary • This LDPC code by itself provides 2-3+dB of gain • Implementation is practical – much flexibility in approach • Less than 1.5dB from AWGN Capacity at Pe = 10-5 with a 1600-bit data block and R = 0.8 • Flexible in code rate and data block size • Shortening schemes allow no restrictions on data block size • Observing OFDM symbol boundaries is not required • Eliminates Channel Interleaver • Decouples FEC from modulation, MIMO/SISO, higher-order modulation, etc. Eric Jacobsen, Intel

Backup Eric Jacobsen, Intel

Partial Reference List • TCM • G. Ungerboeck, “Channel Coding with Multilevel/Phase Signals”, IEEE Trans. IT, Vol. IT-28, No. 1, January, 1982 • BICM • G. Caire, G. Taricco, and E. Biglieri, “Bit-Interleaved Coded Modulation”, IEEE Trans. On IT, May, 1998 • LDPC • Ryan, W., “An Introduction to Low Density Parity Check Codes”, UCLA Short Course Notes, April, 2001 • Kou, Lin, Fossorier, “Low Density Parity Check Codes Based on Finite Geometries: A Rediscovery and New Results”, IEEE Transactions on Information Theory, Vol. 47, No. 7, November 2001 • R. Gallager, “Low-density parity-check codes”, IRE Trans. IT, Jan. 1962 • Chung, et al, “On the design of low-density parity-check codes within 0.0045dB of the Shannon limit”, IEEE Comm. Lett., Feb. 2001 • J. Hou, P. Siegel, and L. Milstein, “Performance Analysis and Code Optimisation for Low Density Parity-Check Codes on Rayleigh Fading Channels” IEEE JSAC, Vol. 19, No. 5, May, 2001 • L. Van der Perre, S. Thoen, P. Vandenameele, B. Gyselinckx, and M. Engels, “Adaptive loading strategy for a high speed OFDM-based WLAN”, Globecomm 98 • Numerous articles on recent developments LDPCs, IEEE Trans. On IT, Feb. 2001 Eric Jacobsen, Intel

Performance comparison around 1.5 bit/s/Hz Hughes NS (LDPC) SpaceBridge (PCCC) Efficiency DVBS DVBS+30% C/N Eric Jacobsen, Intel

LDPC FEC for IEEE 802.11n Applications