首页> 外文学位 >Automatic term extraction and document similarity in special text corpora.
【24h】

Automatic term extraction and document similarity in special text corpora.

机译:特殊文本语料库中的自动术语提取和文档相似性。

获取原文
获取原文并翻译 | 示例

摘要

The first objective of this thesis is to evaluate the performance of the C-value/NC-value methods, which are state-of-the-art methods for automatic term extraction in special text corpora, on a corpus composed of computer science articles and compare it with its published performance on a medical corpus. The C-value/NC-value method can automatically extract multi-word terms from special text corpora and can handle nested terms. It has been experimentally confirmed to outperform previously published automatic term extraction methods on a medical corpus. The second objective of the thesis is to use the extracted terms as features to estimate the similarity of papers in the computer science corpus using the standard Vector Space Model based on TF-IDF. Precision of the term-based method is evaluated and compared with the standard bag-of-words approach, as well as with a link-based method, which estimates the similarity of papers based on the overlap of their local neighborhoods in the citation graph.
机译:本论文的首要目的是评估C值/ NC值方法的性能,C值/ NC值方法是在特殊文本语料库上自动提取术语的最新方法,该方法由计算机科学文章和将其与其在医疗语料库上已发布的性能进行比较。 C值/ NC值方法可以从特殊文本语料库中自动提取多词术语,并可以处理嵌套术语。实验上已经证实它优于以前发表的医学语料库自动术语提取方法。本文的第二个目标是使用提取的项作为特征,使用基于TF-IDF的标准向量空间模型来估计计算机科学语料库中论文的相似度。评估基于术语的方法的精度,并将其与标准的词袋方法以及基于链接的方法进行比较,基于链接的方法根据引文图中局部邻域的重叠来估计论文的相似性。

著录项

  • 作者

    Dong, Li.;

  • 作者单位

    Dalhousie University (Canada).;

  • 授予单位 Dalhousie University (Canada).;
  • 学科 Computer Science.; Information Science.
  • 学位 M.C.Sc.
  • 年度 2002
  • 页码 62 p.
  • 总页数 62
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;信息与知识传播;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号