THESIS
2015
xiv, 153 pages : illustrations (some color) ; 30 cm
Abstract
Throughput and complexity are the important criteria of the Very-Large-Scale Integration (VLSI) implementation of the modern high-performance wireless communication system. In this work, the high-throughput low-complexity digital baseband architecture is studied for high-performance wireless data path. In particular, the hardware implementation is optimized with a methodology of the algorithm-and-architecture co-design. The following modules of a bandwidth-efficient communication system are investigated: the demapper for multi-ary constellation,
the successive cancellation decoding for polar codes, and the list decoding for polar codes.
For constellation demapper, several low-complexity demappers are proposed for the rotated quadrature amplitude modulation constellation used in the Di...[
Read more ]
Throughput and complexity are the important criteria of the Very-Large-Scale Integration (VLSI) implementation of the modern high-performance wireless communication system. In this work, the high-throughput low-complexity digital baseband architecture is studied for high-performance wireless data path. In particular, the hardware implementation is optimized with a methodology of the algorithm-and-architecture co-design. The following modules of a bandwidth-efficient communication system are investigated: the demapper for multi-ary constellation,
the successive cancellation decoding for polar codes, and the list decoding for polar codes.
For constellation demapper, several low-complexity demappers are proposed for the rotated quadrature amplitude modulation constellation used in the Digital Video Broadcasting – Second Generation Terrestrial standard. The demapping complexity is reduced by approximating the 2-dimensional detection with 1-dimensional detection and compensating the loss due to the correlation between the in-phase and quadrature components. Experimental results show that the demapping complexity is reduced by more than 50% with the proposed demappers when comparing with the optimal demapper, while the performance degradation is negligible.
For the successive cancellation decoding (SCD) of the polar codes, both high-throughput and low-cost architectures are proposed. First, a new partial-sum updating algorithm and the corresponding partial-sum network (PSN) architecture are introduced which achieve a delay performance independent of the code length. Second, for a high-performance and area-efficient semi-parallel SCD implementation, a folded PSN architecture is presented to integrate seamlessly with the folded processing element architecture. As a result, both the critical path delay and the area (excluding the memory for folding) of the semi-parallel SCD are approximately constant for a large range of code lengths. The proposed designs are implemented in both Field-Programmable Gate Array and Application-Specific Integrated Circuit and compared with the existing designs. Experimental result shows that for polar codes with large code length, the
decoding throughput is improved by more than 1.05 times and the area is reduced by as much as 50.4%, compared with the state-of-the-art designs. To further reduce the hardware cost of the SCD, a memory-efficient implementation is proposed based on a modified decoding algorithm. The proposed design is implemented in Application-Specific Integrated Circuit and compared with the existing designs. Experimental results show that for polar codes with medium code length, the overall hardware efficiency of the proposed SCD architecture is improved by more than 20% when compared with that of the state-of-the-art designs. For large code length, a 30% improvement in efficiency is achieved.
For the list decoding of the polar codes, a low-latency decoding implementation for large list size is studied. Latency optimizations are carried out at the system, algorithmic, and architectural levels. Specifically, in the system level, a selective expansion method is proposed such that some of the reliable bits are not expanded at the list decoding to reduce the computation and latency. In the algorithmic level, a double thresholding scheme is proposed as a fast and good approximate-sort method for the list management operation in the list decoding to reduce the decoding latency for large list size. The VLSI architecture of the list decoding that implements the selective expansion and double thresholding schemes is then developed. The proposed list decoding architecture is implemented using a UMC 90 nm complementary metal–oxide–semiconductor technology. Experimental results show that, even for a large list size of 16, the proposed low-latency architecture achieves a decoding throughput of 465 Mbps at a frequency of 658 MHz while the degradation in error-correcting performance comparing with the convention exact method is negligible.
Post a Comment