Multi-pattern string matching algorithms.

机译：多模式字符串匹配算法。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multi-pattern string matching is widely used in applications such as network intrusion detection, digital forensics and full text search. In this dissertation, we focus on space efficient multi-pattern string matching as well as on time efficient multicore algorithms.;We develop a highly compressed Aho-Corasick automata for efficient intrusion detection. Our method uses bitmaps with multiple levels of summaries as well as aggressive path compaction. Our compressed automata takes 24% to 31% less memory than taken by the compressed automata of Tuck et al. [57] and the number of additions required to compute popcounts is reduced by about 90%.;We propose a technique to perform high performance exact multi-pattern string matching on the IBM Cell/Broadband Engine(Cell) architecture, which has 9 cores per chip, one control unit and eight computation units. Our technique guarantees the remarkable compression factors of 1 : 34 and 1 : 58, respectively on the memory representation of English language dictionaries and random binary string dictionaries. Our memory-based implementation delivers a sustained throughput between 0.90 and 2.35 Gbps per cell blade, while supporting dictionary sizes up to 9, 260, 000 average patterns per Gbyte of main memory.;We focus on Scalpel, a popular open source file recovery tool which performs file carving using the Boyer-Moore string search algorithm to locate headers and footers in a disk image. We show that the time required for file carving may be reduced significantly by employing multi-pattern search algorithms such as the multipattern Boyer-Moore and Aho-Corasick algorithms as well as asynchronous disk reads and multithreading as typically supported on multicore commodity PCs. Using these methods, we are able to do in-place file carving in essentially the time it takes to read the disk whose files are being carved. Since, using our methods, the limiting factor for performance is the disk read time, there is no advantage to using accelerators such as GPUs as has been proposed by others. To further speed in-place file carving, we would need a mechanism to read disk faster.;Furthermore, we develop GPU adaptations of the Aho-Corasick and multipattern Boyer-Moore string matching algorithms for the two cases GPU-to-GPU and host-to-host. For the GPU-to-GPU case, we consider several refinements to a base GPU implementation and measure the performance gain from each refinement. For the host-to-host case, we analyze two strategies to communicate between the host and the GPU and show that one is optimal with respect to run time while the other requires less device memory. Experiments conducted on an NVIDIA Tesla GT200 GPU that has 240 cores running off of a Xeon 2.8GHz quad-core host CPU show that, for the GPU-to-GPU case, our Aho-Corasick GPU adaptation achieves a speedup between 8.5 and 9.5 relative to a single-thread CPU implementation and between 2.4 and 3.2 relative to the best multithreaded implementation. For the host-to-host case, the GPU AC code achieves a speedup of 3.1 relative to a single-threaded CPU implementation. However, the GPU is unable to deliver any speedup relative to the best multithreaded code running on the quad-core host. In fact, the measured speedups for the latter case ranged between 0.74 and 0.83. Early versions of our multipattern Boyer-Moore adaptations ran 7% to 10% slower than corresponding versions of the AC adaptations and we did not refine the multipattern Boyer-Moore codes further.

机译：多模式字符串匹配广泛用于网络入侵检测，数字取证和全文搜索等应用中。本文主要研究空间高效的多模式字符串匹配以及时间高效的多核算法。我们开发了一种高度压缩的Aho-Corasick自动机以进行有效的入侵检测。我们的方法使用具有多个摘要级别的位图以及积极的路径压缩。我们的压缩自动机比Tuck等人的压缩自动机占用的内存少24％至31％。 [57]，计算出popcounts所需的加法数量减少了约90％。；我们提出了一种在具有9个内核的IBM Cell / Broadband Engine（Cell）体系结构上执行高性能精确多模式字符串匹配的技术每个芯片，一个控制单元和八个计算单元。我们的技术在英语词典和随机二进制字符串词典的内存表示上分别保证了1:34和1:58的显着压缩因子。我们基于内存的实现方式可为每个单元刀片提供0.90到2.35 Gbps的持续吞吐量，同时支持每GB内存最大9、260、000个平均模式的字典大小。我们专注于流行的开源文件恢复工具Scalpel使用Boyer-Moore字符串搜索算法执行文件雕刻，以在磁盘映像中定位页眉和页脚。我们表明，通过采用多模式搜索算法（例如多模式Boyer-Moore和Aho-Corasick算法）以及异步磁盘读取和多线程（通常在多核商用PC上通常支持），可以大大减少文件雕刻所需的时间。使用这些方法，我们能够在实质上读取读取正在雕刻文件的磁盘的时间中进行就地文件雕刻。由于使用我们的方法，性能的限制因素是磁盘读取时间，因此使用其他人提出的加速器（例如GPU）没有优势。为了进一步加快就地文件雕刻速度，我们需要一种机制来更快地读取磁盘。此外，我们针对两种情况下的GPU到GPU和主机开发了Aho-Corasick和多模式Boyer-Moore字符串匹配算法的GPU改编。 -敬主人。对于GPU到GPU的情况，我们考虑对基本GPU实施进行一些改进，并评估每次改进的性能增益。对于主机到主机的情况，我们分析了两种在主机和GPU之间进行通信的策略，并表明一种在运行时间方面是最佳的，而另一种则需要较少的设备内存。在具有至强2.8GHz四核主机CPU的240个内核运行的NVIDIA Tesla GT200 GPU上进行的实验表明，对于GPU到GPU的情况，我们的Aho-Corasick GPU适配可实现8.5到9.5的相对加速单线程CPU实施，相对于最佳多线程实施，介于2.4和3.2之间。对于主机到主机的情况，相对于单线程CPU实现，GPU AC代码实现了3.1的加速。但是，相对于四核主机上运行的最佳多线程代码，GPU无法提供任何加速。实际上，后一种情况下测得的加速比介于0.74和0.83之间。我们的多模式Boyer-Moore改编的早期版本比AC改编的对应版本慢7％至10％，并且我们没有进一步完善多模式的Boyer-Moore代码。

著录项

作者
Zha, Xinyan.;
展开▼
作者单位

University of Florida.;

展开▼
授予单位 University of Florida.;
学科 Engineering Computer.;Computer Science.
学位 Ph.D.
年度 2010
页码 128 p.
总页数 128
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Multi-Pattern Matching for Dictionary Compressed Strings [J] . Chen Hou, Meng Zhang, Hengshan Yue, Sensor Letters: A Journal Dedicated to all Aspects of Sensors in Science, Engineering, and Medicine . 2014,第2期

机译：字典压缩字符串的多模式匹配
2. Efficient bit-parallel multi-patterns approximate string matching algorithms [J] . Rajesh Prasad, Anuj Kumar Sharma, Alok Singh, Scientific Research and Essays . 2011,第4期

机译：高效的位并行多模式近似字符串匹配算法
3. Parallel Length-based Matching Architecture for High Throughput Multi-Pattern Matching [J] . WANG Xiaofei, HU Chengchen, TANG Yi, 电子学报：英文版 . 2012,第003期

机译：高吞吐量多模式匹配的基于长度的并行匹配架构
4. A Multi-pattern Matching Algorithm for Chinese-Hmong Mixed Strings [C] . Shun-Ping He, Li-Ping Mo, Di-Wen Kang CCF International Conference on Natural Language Processing and Chinese Computing . 2019

机译：中文-苗族混合字符串的多模式匹配算法
5. Analysis of data partitioning on correlated data to genetic sequence searches using string matching algorithms. [D] . Nance, David R. 2006

机译：使用字符串匹配算法对相关数据的数据分区进行遗传序列搜索的分析。
6. A multi-pattern hash-binary hybrid algorithm for URL matching in the HTTP protocol [O] . Ping Zeng, Qingping Tan, Xiankai Meng, -1

机译：HTTP协议中用于URL匹配的多模式哈希二进制混合算法
7. Highly Compressed Multi-pattern String Matching on the Cell Broadband Engine [O] . Xinyan Zha, Daniele Paolo Scarpazza, Sartaj Sahni 2011

机译：单元宽带引擎上的高度压缩的多模式字符串匹配
8. Parallel String Matching Algorithms. [R] . Breslauer, D., Galil, Z. 1992

机译：并行字符串匹配算法。

Multi-pattern string matching algorithms.

摘要

著录项

相似文献

相关主题

期刊订阅