首页> 外文期刊>Expert Systems with Application >Geoscience keyphrase extraction algorithm using enhanced word embedding
【24h】

Geoscience keyphrase extraction algorithm using enhanced word embedding

机译:基于增强词嵌入的地球科学关键词提取算法

获取原文
获取原文并翻译 | 示例
           

摘要

A large amount of unstructured textual data about geoscience structures and minerals is buried in geoscience documents and is unused. Automatic keyphrase extraction provides opportunities to leverage this wealth of data for analysis and knowledge discovery. However, keyphrase extraction remains a complicated task, and the performance of state-of-the-art approaches is still low. Automatic discovery of high quality and meaningful keyphrases requires the application of useful knowledge and suitable techniques.Seeing both challenges and opportunities in the situation described above, this paper proposes an ontology and enhanced word embedding-based (OEWE) methodology for automatic keyphrase extraction from geoscience documents. We first develop a quantitative analysis for keyphrase extraction evaluation based on conditional probability and the naive Bayesian model, which is valuable when human-annotated keyphrases are not available. The domain ontology is then performed on a multiway tree to enrich the domain-specific knowledge on certain concepts and relationships in a domain. Simultaneously, word2vec, a model of a word distribution using deep learning, is updated by applying the geological ontology, and it links domain background information and identifies infrequent but representative keyphrases. We use two homemade geoscience datasets to evaluate the performance of OEWE. We compare our method with frequency, term frequency-inverse document frequency (TF-IDF), TextRank and rapid automatic keyword extraction (RAKE), finding that our method achieves average Fl scores of 30.1% and 40.7% on two manually annotated datasets. (C) 2019 Elsevier Ltd. All rights reserved.
机译:有关地球科学结构和矿物的大量非结构化文本数据被埋藏在地球科学文档中,并且未被使用。自动的关键短语提取提供了利用大量数据进行分析和知识发现的机会。但是,关键短语提取仍然是一项复杂的任务,并且最新技术的性能仍然很低。要自动发现高质量且有意义的关键字短语,需要应用有用的知识和合适的技术。鉴于上述情况中的挑战和机遇,本文提出了一种用于从地球科学中自动提取关键字短语的本体和基于增强词嵌入的方法(OEWE)文件。我们首先基于条件概率和朴素的贝叶斯模型,对关键短语提取评估进行定量分析,这在没有人注释的关键短语时很有价值。然后在多路树上执行领域本体,以丰富特定于领域中某些概念和关系的领域知识。同时,使用地质学知识对word2vec(一种使用深度学习的词分布模型)进行更新,并将其链接到领域背景信息并识别不常见但具有代表性的关键词。我们使用两个自制的地球科学数据集来评估OEWE的性能。我们将我们的方法与频率,术语频率逆文档频率(TF-IDF),TextRank和快速自动关键字提取(RAKE)进行了比较,发现我们的方法在两个手动注释的数据集上分别获得了30.1%和40.7%的平均Fl分数。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

  • 来源
    《Expert Systems with Application》 |2019年第7期|157-169|共13页
  • 作者单位

    China Univ Geosci, Fac Informat Engn, Wuhan 430074, Hubei, Peoples R China|Natl Engn Res Ctr GIS, Wuhan 430074, Hubei, Peoples R China;

    China Univ Geosci, Fac Informat Engn, Wuhan 430074, Hubei, Peoples R China|Natl Engn Res Ctr GIS, Wuhan 430074, Hubei, Peoples R China;

    China Univ Geosci, Fac Informat Engn, Wuhan 430074, Hubei, Peoples R China|Natl Engn Res Ctr GIS, Wuhan 430074, Hubei, Peoples R China;

    China Univ Geosci, Fac Informat Engn, Wuhan 430074, Hubei, Peoples R China|Natl Engn Res Ctr GIS, Wuhan 430074, Hubei, Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Keyphrase extraction; Ontology; Word2vec; Geoscience domain;

    机译:关键字提取本体Word2vec地球科学领域;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号