首页> 外文期刊>Procedia Computer Science >Adapted TextRank for Term Extraction: A Generic Method of Improving Automatic Term Extraction Algorithms
【24h】

Adapted TextRank for Term Extraction: A Generic Method of Improving Automatic Term Extraction Algorithms

机译:改编的用于词条提取的TextRank:一种改进自动词条提取算法的通用方法

获取原文
           

摘要

Automatic Term Extraction is a fundamental Natural Language Processing task often used in many knowledge acquisition processes. It is a challenging NLP task due to its high domain dependence: no existing methods can consistently outperform others in all domains, and good ATE is very much an unsolved problem. We propose a generic method for improving the ranking of terms extracted by a potentially wide range of existing ATE methods. We re-design the well-known TextRank algorithm to work at corpus level, using easily obtainable domain resources in the form of seed words or phrases, to compute a score for a word from the target dataset. This is used to refine a candidate term’s score computed by an existing ATE method, potentially improving the ranking of real terms to be selected for tasks such as ontology engineering. Evaluation shows consistent improvement on 10 state of the art ATE methods by up to 25 percentage points in average precision measured at top-ranked K candidates.
机译:自动术语提取是一项基本的自然语言处理任务,通常在许多知识获取过程中使用。由于其对领域的高度依赖性,这是一项具有挑战性的NLP任务:没有任何现有方法可以在所有领域中始终胜过其他方法,而良好的ATE在很大程度上是一个未解决的问题。我们提出了一种通用方法,用于改善可能存在的各种现有ATE方法提取的术语的排名。我们重新设计了众所周知的TextRank算法以使用语种或短语形式的易于获得的域资源,以语料库级别工作,以从目标数据集中计算单词的分数。这用于优化通过现有ATE方法计算的候选术语的分数,从而有可能提高要为诸如本体工程之类的任务选择的真实术语的排名。评估显示,在排名靠前的K位候选者处测得的10种最新ATE方法的平均精度最高提高了25个百分点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号