首页> 外文期刊>Information Processing & Management >Discovering 'title-like' terms
【24h】

Discovering 'title-like' terms

机译:发现“类似标题”的术语

获取原文
获取原文并翻译 | 示例
       

摘要

This paper examines the feasibility of discovering "title-like" terms using a decision tree classifier from the document. The premise of discovering title-like terms is that title terms and title-like terms should behave similarly in the document. This behavior is characterized by a set of distributional and linguistic features. By training the classifier to observe the behavior of title terms in a balanced manner using 25,000 titles in Reuters articles, other terms with similar behavior would also be discovered. Based on 5000 unseen titles, the recall of title terms was 83%, similar to the manual identification of title terms. The precision of finding title terms is low (i.e., 32%) because some non-title but title-like terms should have been identified as well. Seven subjects were asked to rate, on a scale of between I and 5, whether the identified term is a topical/thematic/title term. If a rating of 2.5 is used to determine whether a term is judged to be a "title-like" term, then the mean precision is increased to 58%, or the headline/title is expanded with twice the average number of terms. Since this precision (i.e., 58%) is similar to the mean precision of manually identified title terms averaged across different subjects, we conclude that the discovery of title-like terms using classifiers is a promising approach. (c) 2004 Elsevier Ltd. All rights reserved.
机译:本文研究了使用文档中的决策树分类器发现“类标题”术语的可行性。发现类标题术语的前提是,标题术语和类标题术语在文档中的行为应相似。这种行为的特点是具有一系列的分布和语言特征。通过训练分类器使用路透社文章中的25,000个标题以平衡的方式观察标题术语的行为,还将发现具有类似行为的其他术语。根据5000个看不见的标题,标题术语的召回率为83%,类似于手动识别标题术语。查找标题词的准确性较低(即32%),因为一些非标题但类似于标题的词也应该已经被识别。要求七名受试者以1到5的等级进行评分,以确定所识别的术语是否是主题/主题/标题术语。如果使用2.5的等级来确定某个术语是否被判断为“类标题”术语,则平均精度会提高到58%,或者标题/标题的扩展是平均术语数的两倍。由于此精度(即58%)与跨不同主题平均手动识别的标题词的平均精度相似,因此我们得出结论,使用分类器发现类似标题的词是一种很有前途的方法。 (c)2004 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号