首页> 外文会议> >Context threading: a flexible and efficient dispatch technique for virtual machine interpreters
【24h】

Context threading: a flexible and efficient dispatch technique for virtual machine interpreters

机译:上下文线程:用于虚拟机解释器的灵活高效的调度技术

获取原文

摘要

Direct-threaded interpreters use indirect branches to dispatch bytecodes, but deeply-pipelined architectures rely on branch prediction for performance. Due to the poor correlation between the virtual program's control flow and the hardware program counter, which we call the context problem, direct threading's indirect branches are poorly predicted by the hardware, limiting performance. Our dispatch technique, context threading, improves branch prediction and performance by aligning hardware and virtual machine state. Linear virtual instructions are dispatched with native calls and returns, aligning the hardware and virtual PC. Thus, sequential control flow is predicted by the hardware return stack. We convert virtual branching instructions to native branches, mobilizing the hardware's branch prediction resources. We evaluate the impact of context threading on both branch prediction and performance using interpreters for Java and OCaml on the Pentium and PowerPC architectures. On the Pentium IV our technique reduces mean mispredicted branches by 95%. On the PowerPC, it reduces mean branch stall cycles by 75% for OCaml and 82% for Java. Due to reduced branch hazards, context threading reduces mean execution time by 25% for Java and by 19% and 37% for OCaml on the P4 and PPC970, respectively. We also combine context threading with a conservative inlining technique and find its performance comparable to that of selective inlining.
机译:直接线程解释器使用间接分支来分派字节码,但是深度流水线的体系结构依靠分支预测来提高性能。由于虚拟程序的控制流与硬件程序计数器之间的相关性较差(我们称之为上下文问题),因此硬件很难预测直接线程的间接分支,从而限制了性能。我们的调度技术(上下文线程)通过对齐硬件和虚拟机状态来改善分支预测和性能。通过本机调用和返回调度线性虚拟指令,从而使硬件和虚拟PC保持一致。因此,硬件返回堆栈可预测顺序控制流程。我们将虚拟分支指令转换为本地分支,从而调动硬件的分支预测资源。我们使用奔腾和PowerPC架构上的Java和OCaml解释器评估上下文线程对分支预测和性能的影响。在奔腾IV上,我们的技术可将平均错误预测的分支减少95%。在PowerPC上,对于OCaml和Java,平均分支停顿周期减少了75%,而Java减少了82%。由于减少了分支危险,因此上下文线程在P4和PPC970上将Java的平均执行时间减少了25%,将OCaml的平均执行时间减少了19%和37%。我们还将上下文线程与保守的内联技术相结合,发现其性能可与选择性内联媲美。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号