SimSem: Fast Approximate String Matching in Relation to Semantic Category Disambiguation

机译：Simsem：与语义类别歧义有关的快速近似字符串

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this study we investigate the merits of fast approximate string matching to address challenges relating to spelling variants and to utilise large-scale lexical resources for semantic class disambiguation. We integrate string matching results into machine learning-based disambiguation through the use of a novel set of features that represent the distance of a given textual span to the closest match in each of a collection of lexical resources. We collect lexical resources for a multitude of semantic categories from a variety of biomedi-cal domain sources. The combined resources, containing more than twenty million lexical items, are queried using a recently proposed fast and efficient approximate string matching algorithm that allows us to query large resources without severely impacting system performance. We evaluate our results on six corpora representing a variety of disambiguation tasks. While the integration of approximate string matching features is shown to substantially improve performance on one corpus, results are modest or negative for others. We suggest possible explanations and future research directions. Our lexical resources and implementation are made freely available for research purposes.

机译：在这项研究中，我们研究了快速近似字符串匹配的优点，以解决与拼写变体有关的挑战，并利用大规模词汇资源进行语义歧义。我们将字符串匹配结果集成到基于机器学习的歧义，通过使用一种新颖的特征集，该组件表示给定文本跨度的距离到最接近的词汇资源中的每种集合中的最近匹配。我们从各种BioMeDi-Cal域源中收集众多语义类别的词汇资源。使用最近提出的快速有效的近似字符串匹配算法询问包含超过二千多万种词汇项目的组合资源，该算法允许我们在没有严重影响系统性能的情况下查询大资源。我们评估我们的结果，六个代表各种消歧任务的Corpora。虽然近似字符串匹配功能的集成显示为在一个语料库上显着提高性能，但结果对于其他语料库来说是谦虚或负面的。我们建议可能的解释和未来的研究方向。我们的词汇资源和实施是免费用于研究目的的。

著录项

来源
《Workshop on biomedical natural language processing》|2011年||共10页
会议地点
作者
Pontus Stenetorp; Sampo Pyysalo; Junichi Tsujii;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Faster Approximate String Matching for Short Patterns [J] . Philip Bille Theory of computing systems . 2012,第3期

机译：短模式的近似字符串匹配速度更快
2. Optimal spaced seeds for faster approximate string matching [J] . Martin Farach-Colton, Gad M. Landau, S. Cenk Sahinalp, Journal of computer and system sciences . 2007,第7期

机译：最佳间隔种子，可更快地进行近似字符串匹配
3. New and faster filters for multiple approximate string matching [J] . Baeza-Yates R., Navarro G. Random structures & algorithms . 2002,第1期

机译：新的和更快的过滤器，用于多个近似字符串匹配
4. SimSem: Fast Approximate String Matching in Relation to Semantic Category Disambiguation [C] . Pontus Stenetorp, Sampo Pyysalo, Junichi Tsujii Workshop on biomedical natural language processing 2011. . 2011

机译：SimSem：与语义类别歧义消除相关的快速近似字符串匹配
5. Approximate sequence matching for fast visual retrieval. [D] . Yeh, Mei-Chen. 2009

机译：近似序列匹配，可快速进行视觉检索。
6. Fast randomized approximate string matching with succinct hash data structures [O] . Alberto Policriti, Nicola Prezza 2015

机译：快速随机近似字符串匹配具有简洁的哈希数据结构
7. Optimal spaced seeds for faster approximate string matching [O] . Au S. Cenk Sahinalp, Dekel Tsur 2014

机译：最佳间隔种子，用于快速近似字符串匹配

SimSem: Fast Approximate String Matching in Relation to Semantic Category Disambiguation

摘要

著录项

相似文献

相关主题

期刊订阅