H. 264 video decoder implementation on the C6416 DSP
Multimedia communications terminal equipment has a broad application prospects, can be applied to video conferencing, video phone, PDA, digital television and other fields, so efficient and practical multimedia communications terminal equipment has been the main directions of research in the field.
The realization of multimedia communication terminals are mainly two things: on the one hand the need for fast and stable as a media processor signal processing platform, and the need for multimedia communication protocol standards and software algorithms, especially for audio and video signal compression processing algorithms . The combination of both can produce highly efficient multi-media full of communications equipment. With the current digital signal processor (DSP) for high-speed development, in order to achieve efficient audio and video signal processing provide the possibility; the other hand, the latest low bit rate video compression standard H.264, introduced, providing for communication Video standards and algorithms to guide. Therefore, a combination of both, the H.264 algorithm implemented on dsp for multimedia communications research has a certain meaning and value.
This article describes the H.264 decoder DSP algorithm implementation. In the design, using ATEME's Network Video development Platform (NVDKC6416) as the DSP processing platform to realize H.264 decoding algorithm for optimization. For the QCIF video sequence, decoding speeds of up to 50 ~ 60 frames / sec.
An online video development platform NVDK Profile
NVDK is TI's third-party ATEME introduced TIC6400 series of DSP-based evaluation and development kit is a suitable image, video signal processing, high-speed DSP development platform. The kit, such as video network infrastructure and advanced video applications such as video equipment manufacturers to provide a convenient, enhanced digital video applications project development speed.
1.1 NVDK C6416 architecture
NVDK C6416 from the TMS320C6416 DSP cores, 10/100Mbps Ethernet daughter card, audio / video interface box, PCI bus, memory modules, expansion interface, and independent power supplies and other accessories. Its function block diagram shown in Figure 1.
The main features of 1.2 NVDK C6416
NVDK as a network and video development kit, put a lot of audio and video interface and network interface to directly work on the board, giving the use of TI C6000 series DSP chips as the processing unit has facilitated the development of user front-end platform. It is for the project presentations, algorithms, theory production, data simulation, FPGA development and software optimization of a complete DSP development platform. Its main features are as follows:
· C6416 DSP core: 600MHz clock frequency and 8 instructions in parallel structure, the maximum can reach 4800MIPS processing capabilities.
· Video Features: On the input side, NVDK be able to capture the PAL system or NTSC system, analog video signals, you can use composite video (CVBS) or S-video video signal input, input analog video signal is digital into YUV422 digital video formats. On the output side, NVDK in support of the composite video (CVBS) and S-Video output at the same time, it also provides SVGA output mode, video capture to provide FULL, CIF, and QCIF three kinds of image formats, video output to provide FULL and two kinds of image format CIF .
· Audio Features: Provides two-way two-channel audio output, CD-quality stereo input and output interfaces are also provided along mono microphone input.
· Host Interface: Provides a PCI interface, allowing PC-connected with. The board can either PCI mode, you can also separate work offline.
· Network Interface: Ethernet interface for video stream transmitted over the network to bring more convenient.
· External extended memory: 256M 64-bit wide SDRAM expandable memory and 8M 32-bit wide extended memory SDRAMB and 4MB flash ROM memory space provides the setbacks and flexible memory allocation scheme.
2 H.264 video compression standard
H.264 is ITU-T Video Coding Experts Group (VCEG) and ISO / IEC Moving Picture Experts Group (MPEG) to co-sponsor of the latest international video coding standard. It H.264, H.263 video compression standard based on the carried out further improvements and expansion. The purpose is to further reduce the encoding rate, and improve compression efficiency, while providing a friendly web interface that allows video streams more suitable for transmission on the network. Because the standards can provide a lower bit rate, it is more suitable for multimedia communications.
H. 264 has the following new features:
· Network Adaptation Layer NAL (Network Abstraction Layer).
End of the traditional video coding compilation of video streams in any applications under (whether for storage, transmission, etc.) are uniform stream mode, the video stream only video coding layer (Video Coding Layer). The H.264 increase depending on the application of different NAL title sequence, in order to adapt to different network environments, reducing stream of transmission error.
· Intra predictive coding mode (Intra Prediction Coding).
Intra prediction coding and rational use of the I-frame spatial redundancy, thereby greatly reducing the I-frame coding stream.
· Adaptive block size coding mode (Adqptive Block Size Coding).
H. 264 allows the use of 16 × 16,16 × 8,8 × 16,8 × 8,8 × 4,4 × 8,4 × 4, etc. sub-block prediction and coding modes, using smaller blocks and adaptive coding method makes the prediction residual to reduce the amount of data to further reduce the bit rate.
· High-precision sub-pixel motion estimation (High precision sub-pel Motion Estimation).
H.264 explicitly sets forth the operation estimated that the sub-pixel motion estimation method, and to develop 1 / 4 pixel and 1 / 8 pixel motion estimation method optional. Sub-pixel motion estimation, improving the prediction accuracy while reducing the residual encoding bit rate.
· Multi-frame motion compensation technology (Multi-frame Motion Compensation).
Traditional video compression coding using a (P frame) or two (B-frame) decoding the frame as a predicted current frame of reference frames. In H.264, the maximum allowed five reference frames, through the frame of reference in more years to motion estimation and compensation, to find out bad forecasts smaller blocks, lower encoding bit rate.
· Shaping transform coding (Inter Transform Coding).
H.264 uses integer transform instead of DCT transform, plastic transform using fixed-point operations instead of floating-point operations. Using this transformation, not only can reduce the encoding and decoding time, and for the algorithm is implemented on a multimedia platform to be brought more convenient. At this point, H.264 video coding standard is more suitable as a multimedia terminal codec standard.
· Two alternative entropy coding CAVLC and CABAC.
CAVLC (Context-based Adaptive Variable Length Coding): Content-based Adaptive Variable Length Coding.
CABAC (Context-basedAdaptive Binary Arithmetic Coding): adaptive binary arithmetic coding.
The previous video compression standards are used Huffman coding and variable length coding method of combining entropy coding. Although Huffman coding is a good use of entropy coding methods, but the coding efficiency is not the highest, and, Huffman coding error performance is very low resistance. H.264 used in two entropy coding methods to choose from: CAVLC coding errors in the ability of anti-is relatively high, but the coding efficiency is not very high; CABAC coding is a highly efficient entropy coding method, but the very high computational complexity. Both have advantages and disadvantages, so for different applications, choose a different coding method.
3 H.264 decoder DSP Algorithm Implementation and Optimization of
3.1 PC, and optimized algorithms to achieve H.264
ITU-T official H.264 provides the core algorithm not only needs to be improved in the code structure, but also in specific core algorithm also need big changes in order to achieve real-time requirements. This step need to do specific work includes: place redundant code, standardize program structure, global and local variables to adjust and re-defined structure of the adjustment.
3.2 PC machine code DSP-based H.264
C6000 development tools, Code Composer Studio has its own ANSIC compiler and optimizer, and has its own grammar rules and definitions, so in DSP algorithms implemented on H.264 should PC, written in c language H.264 code changes, achieve full compliance with DSP in the C++ rules.
These changes include: removal of all file operations; remove visual interface operations; reasonable arrangements for the provision and distribution of memory space; standardize data types - because the C6416 is a fixed-point DSP chip, which only supports four data types: short -type (16 bit), int (32bits), long-type (40bits) and the double-type (64bits), it is necessary to re-regulate the data to floating-point computing, said part of the approximate fixed-point or floating-point operations with fixed-point implementation; According to the definition of distance distribution of the memory process constants and variables; the commonly used data structure in the data extracted to near type data defined in the DSP internal memory space, in order to reduce the EMIF port read, thereby enhancing the speed.
3.3 H.264 for DSP algorithm optimization
By PC-code for DSP-based H.264, H.264 can be implemented on the DSP codec algorithm, However, such an algorithm to achieve operating efficiency is very low, because all the code written by C, and not fully utilized a variety of DSP performance. Therefore, we must combine the characteristics of DSP itself, its further optimization in order to achieve the H.264 video decoder algorithm for real-time processing of video images.
Optimization of the DSP code is divided into three levels: project-level optimization, C program-level optimization, assembler-level optimization.
(1) Project-level optimization: These are mainly provided by selecting the CCS compiler optimization parameters, according to the requirements of H.264 systems optimized for various parameters through continuous (-mw-pm-o3-mt, etc.) choice, matching, adjustments to improve the circulation, multi-loop performance of software running water, and thus improve software parallelism.
(2) C program-level optimization: It is mainly used in DSP for the specific characteristics of the functions to streamline the code, data structure optimization, loop optimization, code for parallel processing. In the main work here, includes the following components: to get rid of SNR calculation, frame rate and other supporting information, program modules. Functions and data mapping the region to adjust to frequently used data is stored in on-chip memory, frequently calling the mapping process as much as possible in the adjacent or similar storage area. C function, parallel processing, for the parallel function of poor performance, especially the multi-loop to loop through dismantling will be disassembled as a single multi-loop re-circulation. Reduce the storage area of data read and stored, especially off-chip storage area of data calls in order to reduce the time. Re-definition of data structure and adjustment.
Following the restructuring of data on how characteristics of the rational use of DSP Software Optimization.
Data structure refers to the data types and their distribution in the memory space, different data structures, the performance of the program have different effects. Therefore, the data structure of the adjustment program in DSP, parallel execution is indispensable step.
In the H.264 decoder core code, the array mpr [i] [j] is used to store a macro block prediction coefficients, the data type is int type, where i, j is the coefficient of coordinates. But the prediction coefficients are actually only 8-bit wide, therefore, defined as a byte-type would be sufficient. One hand, this saves memory space on the other hand, with byte type can be used directly LDW instructions instead of LDB instruction, one to read four data, saving the time to read. Therefore, the coefficients read in H.264 are based on blocks as a unit, while the kernel data structures in the mpr obviously can not take full advantage of the characteristics of DSP, so the data storage structure is also need to be adjusted to mpr assigned to each block a contiguous memory space is conducive to the transmission of data, as shown in Figure 2. In this way, each time set a block later, just change the one-dimensional information that can determine the coefficient of the position, while the original structure of each factor has a determining factor of two. Through such data adjustments can significantly improve the program run faster.
(3) Assembler-level optimization. Assembly-level optimization consists of two parts: a linear assembly language and directly used to optimize the assembly language optimization. Because the system limitations of the compiler does not bring all the functions are well optimized, so that statistical comparisons of time-consuming need for C-language function, using assembly language rewritten. These functions include: interpolation function, intra-prediction function, also for functions such as anti-plastic surgery.
Below to the section of the difference function to illustrate the preparation of compilation of performance brought about.
Horizontal 1 / 2 interpolation Source Code:
This code uses a six-order interpolation filter to 1 / 2 pixel positions were inserted out of 16 values (one block). The source code using triple loop, the inner loop is the interpolation filter, if the direct use of the compiler to compile the source code is compiled, then read the inner loop must be repeated a number of memory data. Using compilation of his writing, you can improve the algorithm significantly reduced the running time of the function.
Shown in Figure 3, in the first half-pixel interpolation location, should be read in memory, the value of a ~ 6 pixels, interpolation the second half-pixel location, to read the value of 2 to 7 points, so that on the repeated reading of 2 ~ 5 pixels values, and that the need for interpolating a point multiply six times, 5 times addition. Using assembly language, manual assembly line row, can reduce the number of data reads, while reducing the multiplication and addition instructions. First, the use of direct reading instruction LDNW data to the register 8 for each interpolation rather than direct use of registers to read data in memory Zaiqu. In addition, the use of DOTPSU4 multiply accumulate instructions the command instead of MPL, the four multiplications and 3 additions instead of using a directive to reduce the number of instructions.
By the above optimization method, and ultimately based on the C6416 core H.264 baseline decoder algorithm.
4 algorithm performance evaluation and prospects
In the NVDK C6416 environment to test the decoder algorithm, QCIF test sequences, has been able to achieve 50 ~ 60 frames / s decoding speed, far reaching real-time decoding purposes.
In the NVDK C6416 board to achieve the H.264 video decoder has the function of strong, flexible characteristics, have broad application prospects. The optimization algorithm is applicable not only to NVDK board, for all of the C64 development board has a universal, as long as the memory allocation under the board to re-configure the memory parameter file, then the algorithm can be ported to new development board. The H.264 video decoder connected to the network platform, it can be applied to video conferencing, video telephony, wireless streaming media communications applications.
Digital Signal Processing Articles
- Based on virtual reality technology, bicycle Roaming System Research and Implementation of
- H. 264 video decoder implementation on the C6416 DSP
- TMS320C6205-based signal acquisition and processing system
- TMS320C6205-based signal acquisition and processing system
- DSP chip in ultrasonic drilling fluid leak detector of
- DSP-based non-invasive variceal pressure measurement system
- Simple DSP-based digital frequency meter
- DSP-controlled power line communications analog front-end interface design
- Multi-DSP system, radar polarized signals collection and two pairs of IQ Division
- CPLD in the DSP System Design
- TMS320VC5416 Parallel bootstrap realization of the ingenious
- Hierarchical structure of high-speed digital signal processing system design and application of
- FLASH the TMS320C6x DSP Research and Implementation guide
- TMS320C6202-based compound of active and passive guided the development of signal processing systems
- DNP3.0 in DSP-based realization of the FTU
- Based on the TMS320C6000 DSP Viterbi optimal design procedure
- TMS320LF2407A in mixed voltage system design
- DSP-based hardware and software online program design and implementation of
- ADSP21161 than phase-based ranging radar tracking control system design
- TMS320C6701-based control of multi-chip AD9852 Interface Circuit Design
Can't Find What You're Looking For?
Rating: Not yet rated