首页> 外文期刊>Expert systems with applications >Geoscience keyphrase extraction algorithm using enhanced word embedding
【24h】

Geoscience keyphrase extraction algorithm using enhanced word embedding

机译:Geoscience Keyphrase提取算法使用增强词嵌入

获取原文
获取原文并翻译 | 示例
           

摘要

A large amount of unstructured textual data about geoscience structures and minerals is buried in geoscience documents and is unused. Automatic keyphrase extraction provides opportunities to leverage this wealth of data for analysis and knowledge discovery. However, keyphrase extraction remains a complicated task, and the performance of state-of-the-art approaches is still low. Automatic discovery of high quality and meaningful keyphrases requires the application of useful knowledge and suitable techniques.Seeing both challenges and opportunities in the situation described above, this paper proposes an ontology and enhanced word embedding-based (OEWE) methodology for automatic keyphrase extraction from geoscience documents. We first develop a quantitative analysis for keyphrase extraction evaluation based on conditional probability and the naive Bayesian model, which is valuable when human-annotated keyphrases are not available. The domain ontology is then performed on a multiway tree to enrich the domain-specific knowledge on certain concepts and relationships in a domain. Simultaneously, word2vec, a model of a word distribution using deep learning, is updated by applying the geological ontology, and it links domain background information and identifies infrequent but representative keyphrases. We use two homemade geoscience datasets to evaluate the performance of OEWE. We compare our method with frequency, term frequency-inverse document frequency (TF-IDF), TextRank and rapid automatic keyword extraction (RAKE), finding that our method achieves average Fl scores of 30.1% and 40.7% on two manually annotated datasets. (C) 2019 Elsevier Ltd. All rights reserved.
机译:关于地球科学结构和矿物质的大量非结构化文本数据被埋葬在地球科学文件中并未使用。自动关键字提取提供了利用这一大量数据进行分析和知识发现的机会。然而,关键酶提取仍然是一个复杂的任务,并且最先进的方法的性能仍然很低。高质量和有意义的关键词的自动发现需要应用有用的知识和合适的技术。关于上述情况的挑战和机遇,本文提出了一种基于本体和增强的词嵌入的(OEWE)方法,用于从地球科学的自动关键词提取文件。我们首先为基于条件概率和幼稚贝叶斯模型进行关键肾上腺酶提取评估的定量分析,当不可用人为的关键术时,这是有价值的。然后在多道树上执行域本体,以丰富域中某些概念和关系的域特定知识。同时,通过应用地质本体进行更新使用深度学习的Word2VEC,使用深度学习的单词分布模型,并将域背景信息链接识别不频繁但代表性的关键词。我们使用两个自制地球科学数据集来评估OEWE的性能。我们将我们的方法与频率,术语频率反转频率(TF-IDF),Textrank和快速自动关键词提取(Rake)进行比较,发现我们的方法在两个手动注释的数据集中实现了30.1%和40.7%的平均流量。 (c)2019 Elsevier Ltd.保留所有权利。

著录项

  • 来源
    《Expert systems with applications》 |2019年第7期|157-169|共13页
  • 作者单位

    China Univ Geosci Fac Informat Engn Wuhan 430074 Hubei Peoples R China|Natl Engn Res Ctr GIS Wuhan 430074 Hubei Peoples R China;

    China Univ Geosci Fac Informat Engn Wuhan 430074 Hubei Peoples R China|Natl Engn Res Ctr GIS Wuhan 430074 Hubei Peoples R China;

    China Univ Geosci Fac Informat Engn Wuhan 430074 Hubei Peoples R China|Natl Engn Res Ctr GIS Wuhan 430074 Hubei Peoples R China;

    China Univ Geosci Fac Informat Engn Wuhan 430074 Hubei Peoples R China|Natl Engn Res Ctr GIS Wuhan 430074 Hubei Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Keyphrase extraction; Ontology; Word2vec; Geoscience domain;

    机译:关键词提取;本体;Word2VEC;地球科学域;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号