首页> 外文会议>Transactions on high-performance embedded architectures and compilers III >A Multithreaded Multicore System for Embedded Media Processing
【24h】

A Multithreaded Multicore System for Embedded Media Processing

机译:用于嵌入式媒体处理的多线程多核系统

获取原文
获取原文并翻译 | 示例

摘要

We describe a multicore system targeting media processing applications where the cores are multithreaded. The multithreaded cores use a new type of multithreading that we call Subset Static Interleaved (SSI) multithreading. SSI multithreading combines the ad-vantages of blocked multithreading and a simple form of interleaved mul-tithreading called static interleaved multithreading. SSI multithreading divides threads into foreground and background threads and performs static interleaving among the foreground threads. A foreground thread is swapped with a runnable background thread whenever the foreground thread is stalled. SSI multithreading achieves reduced operation laten-cies, memory latency tolerance, fast context switching, and compared to traditional dynamic interleaving, a relatively low design complexity of the register file.rnWe use a task scheduling unit (TSU) to dispatch tasks to the cores. The TSU is aware of the fact that the cores are multithreaded. This makes a more efficient mapping of tasks to cores possible by scheduling tasks on the least loaded cores.rnWe evaluate the system on an optimized Super HD H.264 decoder where the macroblock decoding and deblocking has been parallelized. The complexity of the H.264 standard and the high resolution makes this a challenging and performance demanding application. We achieve speedups of up to 17.7 times for 16 cores with four threads per core relative to a single-threaded single core. Furthermore, the proposed SSI multithreading achieves a speedup of 1.52 times relative to no multi-threading, while blocked multithreading achieves only 1.38 times and a restricted form of interleaved multithreading achieves only 1.37 times speedup.
机译:我们描述了一种针对多核系统的媒体处理应用程序的多核系统。多线程核心使用一种新的多线程类型,我们称为子集静态交错(SSI)多线程。 SSI多线程结合了阻塞式多线程的优势和一种称为静态交错式多线程的交错式多线程的简单形式。 SSI多线程将线程分为前台线程和后台线程,并在前台线程之间执行静态交织。每当前景线程停止时,前景线程就会与可运行的后台线程交换。 SSI多线程可减少操作延迟,内存延迟容限,快速的上下文切换,并且与传统的动态交错相比,寄存器文件的设计复杂度相对较低。我们使用任务调度单元(TSU)将任务调度到内核。 TSU知道内核是多线程的。通过在最低负载的内核上调度任务,可以更有效地将任务映射到内核。我们在优化的Super HD H.264解码器上对系统进行了评估,其中宏块解码和解块已并行化。 H.264标准的复杂性和高分辨率使其成为具有挑战性和性能要求的应用程序。与单线程单核相比,我们针对16个核(每个核具有四个线程)实现了高达17.7倍的加速。此外,相对于没有多线程,建议的SSI多线程实现了1.52倍的加速,而阻塞多线程仅实现了1.38倍,而受限形式的交错多线程仅实现了1.37倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号