Adapted TextRank for Term Extraction: A Generic Method of Improving Automatic Term Extraction Algorithms

Ziqi Zhang; Johann Petrak; Diana Maynard

首页> 外文期刊>Procedia Computer Science >Adapted TextRank for Term Extraction: A Generic Method of Improving Automatic Term Extraction Algorithms

【24h】

Adapted TextRank for Term Extraction: A Generic Method of Improving Automatic Term Extraction Algorithms

机译：改编的用于词条提取的TextRank：一种改进自动词条提取算法的通用方法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic Term Extraction is a fundamental Natural Language Processing task often used in many knowledge acquisition processes. It is a challenging NLP task due to its high domain dependence: no existing methods can consistently outperform others in all domains, and good ATE is very much an unsolved problem. We propose a generic method for improving the ranking of terms extracted by a potentially wide range of existing ATE methods. We re-design the well-known TextRank algorithm to work at corpus level, using easily obtainable domain resources in the form of seed words or phrases, to compute a score for a word from the target dataset. This is used to refine a candidate term’s score computed by an existing ATE method, potentially improving the ranking of real terms to be selected for tasks such as ontology engineering. Evaluation shows consistent improvement on 10 state of the art ATE methods by up to 25 percentage points in average precision measured at top-ranked K candidates.

机译：自动术语提取是一项基本的自然语言处理任务，通常在许多知识获取过程中使用。由于其对领域的高度依赖性，这是一项具有挑战性的NLP任务：没有任何现有方法可以在所有领域中始终胜过其他方法，而良好的ATE在很大程度上是一个未解决的问题。我们提出了一种通用方法，用于改善可能存在的各种现有ATE方法提取的术语的排名。我们重新设计了众所周知的TextRank算法以使用语种或短语形式的易于获得的域资源，以语料库级别工作，以从目标数据集中计算单词的分数。这用于优化通过现有ATE方法计算的候选术语的分数，从而有可能提高要为诸如本体工程之类的任务选择的真实术语的排名。评估显示，在排名靠前的K位候选者处测得的10种最新ATE方法的平均精度最高提高了25个百分点。

著录项

来源
《Procedia Computer Science》 |2018年第22期|共7页
作者
Ziqi Zhang; Johann Petrak; Diana Maynard;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
automatic term extractionNLPterminologyontology engineering;

机译：自动术语提取NLP术语本体工程;

相似文献

外文文献
中文文献
专利

1. Automatic extraction of relationships between terms by means of Kohonen's algorithm [J] . Vicente P. Guerrero, Felix Moya-Anegon, Victor Herrero-Solana Library & Information Science Research . 2002,第3期

机译：借助Kohonen算法自动提取术语之间的关系
2. SemRe-Rank: Improving Automatic Term Extraction by Incorporating Semantic Relatedness with Personalised PageRank [J] . Zhang Ziqi, Gao Jie, Ciravegna Fabio ACM transactions on knowledge discovery from data . 2018,第5期

机译：SemRe-Rank：通过将语义相关性与个性化PageRank相结合来改进自动术语提取
3. Automatic extraction of candidate nomenclature terms using the doublet method [J] . Jules J Berman BMC Medical Informatics and Decision Making . 2005,第1期

机译：使用doublet方法自动提取候选术语术语
4. TermEval 2020: Shared Task on Automatic Term Extraction Using the Annotated Corpora for Term Extraction Research (ACTER) Dataset [C] . Ayla Rigouts Terryn, Veronique Hoste, Patrick Drouin, International Workshop on Computational Terminology . 2020

机译：TermEval 2020：使用带注释的语料库进行术语提取研究（ACTER）数据集的自动术语提取的共享任务
5. Parallel automatic term extraction from large Web corpora. [D] . Zhang, Lingyan. 2004

机译：从大型Web语料库中并行自动提取术语。
6. Automatic extraction of candidate nomenclature terms using the doublet method [O] . Jules J Berman 2005

机译：使用doublet方法自动提取候选术语术语
7. The main challenge of semi-automatic term extraction methods [O] . Conrado Merley da Silva, Pardo Thiago Alexandre Salgueiro, Rezende Solange Oliveira 2015

机译：半自动术语提取方法的主要挑战
8. New Generic Method for the Semi-Automatic Extraction of River and Road Networks in Low and Mid-Resolution Satellite Images. [R] . Grazzini, J., Dillard, S., Soille, P. 2010

机译：中低分辨率卫星影像中河道路网半自动提取的新通用方法。

Adapted TextRank for Term Extraction: A Generic Method of Improving Automatic Term Extraction Algorithms

摘要

著录项

相似文献

相关主题

期刊订阅