【24h】

A New Algorithm for Identifying Loops in Decompilation

机译:识别反编译循环的新算法

获取原文
获取原文并翻译 | 示例

摘要

Loop identification is an essential step of control flow analysis in decompilation. The Classical algorithm for identifying loops is Tarjan's intervalfinding algorithm, which is restricted to reducible graphs. Havlak presents one extension of Tarjan's algorithm to deal with irreducible graphs, which constructs a loop-nesting forest for an arbitrary flow graph. There's evidence showing that the running time of this algorithm is quadratic in the worst-case, and not almost linear as claimed. Ramalingam presents an improved algorithm with low time complexity on arbitrary graphs, but it performs not quite well on "real" control flow graphs (CFG). We present a novel algorithm for identifying loops in arbitrary CFGs. Based on a more detailed exploration on properties of loops and depth-first search (DFS), this algorithm traverses a CFG only once based on DFS and collects all information needed on the fly. It runs in approximately linear time and does not use any complicated data structures such as Interval/Derived Sequence of Graphs (DSG) or UNION-FIND sets. To perform complexity analysis of the algorithm, we introduce a new concept called unstructuredness coefficient to describe the unstructuredness of CFGs, and we find that the unstructuredness coefficients of these executables are usually small (<1.5). Such "low-unstructuredness" property distinguishes these CFGs from general single-root connected directed graphs, and it offers an explanation why those algorithms existed perform not quite well on real-world cases. The new algorithm has been applied to 11526 CFGs in 6 typical binary executables on both Linux and Window platforms. Experimental result has validated our theoretical analysis and it shows that our algorithm runs 2-5 times faster than the Havlak-Tarjan algorithm, and 2-8 times faster than the Ramalingam-Havlak-Tarjan algorithm.
机译:回路识别是反编译中控制流分析的重要步骤。识别循环的经典算法是Tarjan的间隔查找算法,该算法仅限于可归约图。 Havlak提出了Tarjan算法的一种扩展,用于处理不可约图,该算法为任意流程图构建了一个循环嵌套的森林。有证据表明,在最坏的情况下,该算法的运行时间是二次的,而不是所要求的几乎线性的。 Ramalingam提出了一种改进的算法,该算法在任意图上具有较低的时间复杂度,但是在“实际”控制流程图(CFG)上却表现不佳。我们提出了一种新颖的算法,用于识别任意CFG中的循环。基于对循环属性和深度优先搜索(DFS)的更详细研究,该算法基于DFS仅遍历一次CFG,并即时收集所有所需信息。它以大约线性时间运行,并且不使用任何复杂的数据结构,例如区间/衍生图序列(DSG)或UNION-FIND集。为了进行算法的复杂性分析,我们引入了一个非结构化系数的新概念来描述CFG的非结构化,发现这些可执行文件的非结构化系数通常很小(<1.5)。这种“低非结构性”属性将这些CFG与一般的单根连接有向图区分开来,并提供了一个解释,说明存在这些算法的原因在现实情况下效果不佳。新算法已在Linux和Window平台上应用于6个典型二进制可执行文件中的11526 CFG。实验结果验证了我们的理论分析,结果表明我们的算法运行速度比Havlak-Tarjan算法快2-5倍,比Ramalingam-Havlak-Tarjan算法快2-8倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号