首页> 外文会议>International Conference on Pattern Recognition Applications and Methods >Explaining Unintelligible Words by Means of their Context
【24h】

Explaining Unintelligible Words by Means of their Context

机译:通过上下文解释未辨别的单词

获取原文

摘要

Explaining unintelligible words is a practical problem for text obtained by optical character recognition, from the Web (e.g., because of misspellings), etc. Approaches to wikification, to enriching text by linking words to Wikipedia articles, could help solve this problem. However, existing methods for wikification assume that the text is correct, so they are not capable of wikifying erroneous text. Because of errors, the problem of disambiguation (identifying the appropriate article to link to) becomes large-scale: as the word to be disambiguated is unknown, the article to link to has to be selected from among hundreds, maybe thousands of candidate articles. Existing approaches for the case where the word is known build upon the distributional hypothesis: words that occur in the same contexts tend to have similar meanings. The increased number of candidate articles makes the difficulty of spuriously similar contexts (when two contexts are similar but belong to different articles) more severe. We propose a method to overcome this difficulty by combining the distributional hypothesis with structured sparsity, a rapidly expanding area of research. Empirically, our approach based on structured sparsity compares favorably to various traditional classification methods.
机译:解释尚不可理性的单词是通过光学字符识别获得的文本的实际问题,从Web(例如,由于拼写错误)等。通过将单词与维基百科文章联系起来,为丰富文本来丰富文本,可以帮助解决这个问题。但是,Wikeration的现有方法假定文本是正确的,因此它们无法维持错误的文本。由于错误,消除歧义的问题(识别链接到的适当文章)变为大规模:由于要消除的词语是未知的,所以要链接的文章必须从数百个中选择数千个候选文章。在分布假设上已知这个词构建的情况的现有方法:在同一背景下发生的单词往往具有类似的含义。增加的候选物品数量增加了虚拟类似的环境(当两个上下文类似但属于不同的文章时)更严重。我们提出了一种通过将分布假设与结构性稀疏性相结合的方法来克服这种困难,这是一种快速扩大的研究领域。经验上,我们的方法基于结构性稀疏性对各种传统分类方法有利比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号