首页> 外文会议>The role of digital libraries in a time of global change >Keyphrases Extraction from Scientific Documents: Improving Machine Learning Approaches with Natural Language Processing
【24h】

Keyphrases Extraction from Scientific Documents: Improving Machine Learning Approaches with Natural Language Processing

机译:从科学文献中提取关键短语:通过自然语言处理改进机器学习方法

获取原文
获取原文并翻译 | 示例

摘要

In this paper we use Natural Language Processing techniques to improve different machine learning approaches (Support Vector Machines (SVM), Local SVM, Random Forests) to the problem of automatic keyphrases extraction from scientific papers. For the evaluation we propose a large and high-quality dataset: 2000 ACM papers from the Computer Science domain. We evaluate by comparison with expert-assigned keyphrases. Evaluation shows promising results that outperform state-of-the-art Bayesian learning system KEA improving the average F-Measure from 22% (KEA) to 30% (Random Forest) on the same dataset without the use of controlled vocabularies. Finally, we report a detailed analysis of the effect of the individual NLP features and data set size on the overall quality of extracted keyphrases.
机译:在本文中,我们使用自然语言处理技术来改进不同的机器学习方法(支持向量机(SVM),本地SVM,随机森林),以解决从科学论文中自动提取关键短语的问题。为了进行评估,我们提出了一个大型且高质量的数据集:来自计算机科学领域的2000篇ACM论文。我们通过与专家分配的关键词进行比较来进行评估。评估显示出令人鼓舞的结果,该结果优于最新的贝叶斯学习系统KEA,在不使用受控词汇的情况下,同一数据集上的平均F值从22%(KEA)提高到30%(Random Forest)。最后,我们报告了对各个NLP功能和数据集大小对提取的关键短语的整体质量的影响的详细分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号