首页> 外文会议>Image processing: algorithms and systems X ; and Parallel processing for imaging applications II >Parallel Processing Architecture for H.264 Deblocking Filter on Multi-core Platforms
【24h】

Parallel Processing Architecture for H.264 Deblocking Filter on Multi-core Platforms

机译:多核平台上H.264解块滤波器的并行处理架构

获取原文
获取原文并翻译 | 示例

摘要

Massively parallel computing (multi-core) chips offer outstanding new solutions that satisfy the increasing demand for high resolution and high quality video compression technologies such as H.264. Such solutions not only provide exceptional quality but also efficiency, low power, and low latency, previously unattainable in software based designs. While custom hardware and Application Specific Integrated Circuit (ASIC) technologies may achieve low-latency, low power, and real-time performance in some consumer devices, many applications require a flexible and scalable software-defined solution. The deblocking filter in H.264 encoder/decoder poses difficult implementation challenges because of heavy data dependencies and the conditional nature of the computations. Deblocking filter implementations tend to be fixed and difficult to reconfigure for different needs. The ability to scale up for higher quality requirements such as 10-bit pixel depth or a 4:2:2 chroma format often reduces the throughput of a parallel architecture designed for lower feature set. A scalable architecture for deblocking filtering, created with a massively parallel processor based solution, means that the same encoder or decoder will be deployed in a variety of applications, at different video resolutions, for different power requirements, and at higher bit-depths and better color sub sampling patterns like YUV, 4:2:2, or 4:4:4 formats. Low power, software-defined encoders/decoders may be implemented using a massively parallel processor array, like that found in HyperX technology, with 100 or more cores and distributed memory. The large number of processor elements allows the silicon device to operate more efficiently than conventional DSP or CPU technology. This software programing model for massively parallel processors offers a flexible implementation and a power efficiency close to that of ASIC solutions. This work describes a scalable parallel architecture for an H.264 compliant deblocking filter for multi core platforms such as HyperX technology. Parallel techniques such as parallel processing of independent macroblocks, sub blocks, and pixel row level are examined in this work. The deblocking architecture consists of a basic cell called deblocking filter unit (DFU) and dependent data buffer manager (DFM). The DFU can be used in several instances, catering to different performance needs the DFM serves the data required for the different number of DFUs, and also manages all the neighboring data required for future data processing of DFUs. This approach achieves the scalability, flexibility, and performance excellence required in deblocking filters.
机译:大规模并行计算(多核)芯片提供了出色的新解决方案,可以满足对高分辨率和高质量视频压缩技术(例如H.264)不断增长的需求。这样的解决方案不仅提供了卓越的质量,而且还提供了以前在基于软件的设计中无法实现的效率,低功耗和低延迟。尽管自定义硬件和专用集成电路(ASIC)技术可以在某些消费类设备中实现低延迟,低功耗和实时性能,但许多应用程序仍需要灵活且可扩展的软件定义解决方案。 H.264编码器/解码器中的解块滤波器由于数据依赖性强和计算的条件性质而带来了实施上的困难。去块滤波器的实现往往是固定的,并且难以针对不同的需求进行重新配置。为满足更高质量要求(例如10位像素深度或4:2:2色度格式)而进行扩展的能力通常会降低为较低功能集设计的并行体系结构的吞吐量。使用基于大规模并行处理器的解决方案创建的可扩展的去块滤波架构,意味着可以将相同的编码器或解码器部署在各种应用程序中,以不同的视频分辨率,不同的功率要求,更高的比特深度和更好的分辨率颜色子采样模式,例如YUV,4:2:2或4:4:4格式。低功耗,软件定义的编码器/解码器可使用大规模并行处理器阵列(如HyperX技术中发现的那样)来实现,该阵列具有100个或更多内核和分布式内存。大量的处理器元件使硅设备比传统的DSP或CPU技术更有效地运行。这种适用于大规模并行处理器的软件编程模型可提供灵活的实现方式,并具有接近ASIC解决方案的功效。这项工作描述了适用于多核平台(例如HyperX技术)的H.264兼容解块滤波器的可伸缩并行体系结构。在这项工作中,研究了并行技术,例如独立宏块,子块和像素行级别的并行处理。解块体系结构由称为解块滤波器单元(DFU)和从属数据缓冲区管理器(DFM)的基本单元组成。 DFU可以在多种情况下使用,以满足不同的性能需求,DFM为不同数量的DFU提供所需的数据,并管理DFU的未来数据处理所需的所有相邻数据。这种方法可实现解块滤波器所需的可伸缩性,灵活性和出色的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号