首页> 外文会议>International conference on Asian-Pacific digital libraries >Finding 'Similar but Different' Documents Based on Coordinate Relationship
【24h】

Finding 'Similar but Different' Documents Based on Coordinate Relationship

机译:基于坐标关系查找“​​相似但不同”的文档

获取原文

摘要

Traditional search technologies are based on similarity relationship such that they return content similar documents in accordance with a given one. However, such similarity-based search does not always result in good results, e.g., similar documents will bring little additional information so that it is difficult to increase information gain. In this paper, we propose a method to find similar but different documents of a user-given one by distinguishing coordinate relationship from similarity relationship between documents. Simply, a similar but different document denotes the document with the same topic as that of the given document, but describing different events or concepts. For example, given as the input a news article stating the occurrence of the Oregon school shooting, articles stating the occurrence of other school shooting events, such as the Virginia Tech shooting, are detected and returned to users. Experiments conducted on the New York Times Annotated Corpus verify the effectiveness of our method and illustrate the importance of incorporating coordinate relationship to find similar but different documents.
机译:传统的搜索技术基于相似性关系,因此它们根据给定的内容返回内容相似的文档。但是,这样的基于相似度的搜索并不总是产生良好的结果,例如,相似的文档将带来很少的附加信息,因此很难增加信息增益。在本文中,我们提出了一种通过区分文档之间的相似关系和坐标关系来查找用户给定的相似但不同文档的方法。简而言之,相似但不同的文档表示与给定文档具有相同主题但描述了不同事件或概念的文档。例如,给定陈述俄勒冈州发生学校枪击事件的新闻作为输入,检测到陈述其他学校枪击事件(例如弗吉尼亚理工大学枪击事件)发生的新闻,并将其返回给用户。在《纽约时报带注释的语料库》上进行的实验验证了我们方法的有效性,并说明了合并坐标关系以查找相似但不同的文档的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号