Keyphrases Extraction from Scientific Documents: Improving Machine Learning Approaches with Natural Language Processing

机译：从科学文献中提取关键短语：通过自然语言处理改进机器学习方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we use Natural Language Processing techniques to improve different machine learning approaches (Support Vector Machines (SVM), Local SVM, Random Forests) to the problem of automatic keyphrases extraction from scientific papers. For the evaluation we propose a large and high-quality dataset: 2000 ACM papers from the Computer Science domain. We evaluate by comparison with expert-assigned keyphrases. Evaluation shows promising results that outperform state-of-the-art Bayesian learning system KEA improving the average F-Measure from 22% (KEA) to 30% (Random Forest) on the same dataset without the use of controlled vocabularies. Finally, we report a detailed analysis of the effect of the individual NLP features and data set size on the overall quality of extracted keyphrases.

机译：在本文中，我们使用自然语言处理技术来改进不同的机器学习方法（支持向量机（SVM），本地SVM，随机森林），以解决从科学论文中自动提取关键短语的问题。为了进行评估，我们提出了一个大型且高质量的数据集：来自计算机科学领域的2000篇ACM论文。我们通过与专家分配的关键词进行比较来进行评估。评估显示出令人鼓舞的结果，该结果优于最新的贝叶斯学习系统KEA，在不使用受控词汇的情况下，同一数据集上的平均F值从22％（KEA）提高到30％（Random Forest）。最后，我们报告了对各个NLP功能和数据集大小对提取的关键短语的整体质量的影响的详细分析。

著录项

来源
《The role of digital libraries in a time of global change》|2010年|p.102-111|共10页
会议地点 Gold Coast(AU);Gold Coast(AU);Gold Coast(AU);Gold Coast(AU)
作者
Mikalai Krapivin; Aliaksandr Autayeu; Maurizio Marchese; Enrico Blanzieri; Nicola Segata;
展开▼
作者单位

DISI, University of Trento, Italy;

DISI, University of Trento, Italy;

DISI, University of Trento, Italy;

DISI, University of Trento, Italy;

DISI, University of Trento, Italy;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类电子图书馆、数字图书馆;
关键词

相似文献

外文文献
中文文献
专利

1. AUTOMATIC MACHINE LEARNING OF KEYPHRASE EXTRACTION FROM SHORT HTML DOCUMENTS WRITTEN IN HEBREW [J] . YAAKOV HACOHEN-KERNER, ITTAY STERN, DAVID KORKUS, Cybernetics and Systems . 2007,第1期

机译：从希伯来语简短HTML文档中提取关键词的自动机器学习
2. Evaluation of keyphrase extraction algorithm and tiling process for a document/resource recommender within e-learning environments [J] . Eleni Mangina, John Kilbride Computers & education . 2008,第3期

机译：在线学习环境中文档/资源推荐者的关键词提取算法和切片过程的评估
3. Natural language processing and machine learning to enable automatic extraction and classification of patients’ smoking status from electronic medical records [J] . Andrea Caccamisi, Leif J?rgensen, Hercules Dalianis, Upsala journal of medical sciences . 2020,第4期

机译：自然语言加工和机器学习，可以从电子医疗记录自动提取和分类患者的吸烟状态
4. Keyphrases Extraction from Scientific Documents: Improving Machine Learning Approaches with Natural Language Processing [C] . Mikalai Krapivin, Aliaksandr Autayeu, Maurizio Marchese, International Conference on Asia-Pacific Digital Libraries . 2010

机译：关键篇，从科学文件提取：改善自然语言处理的机器学习方法
5. Improving the Autocoding of Injury Narratives Using a Combination of Machine Learning Methods and Natural Language Processing Techniques [D] . Nanda, Gaurav. 2017

机译：结合机器学习方法和自然语言处理技术来改进伤害性叙述的自动编码
6. Natural language processing and machine learning to enable automatic extraction and classification of patients’ smoking status from electronic medical records [O] . Andrea Caccamisi, Leif Jørgensen, Hercules Dalianis, 2020

机译：自然语言加工和机器学习可以从电子医疗记录自动提取和分类患者的吸烟状态
7. Automated Classification of Radiology Reports for Acute Lung Injury: Comparison of Keyword and Machine Learning Based Natural Language Processing Approaches [O] . Imre Solti, Colin R. Cooke, Fei Xia, 2015

机译：急性肺损伤放射学报告的自动分类：基于关键词和机器学习的自然语言处理方法的比较

Keyphrases Extraction from Scientific Documents: Improving Machine Learning Approaches with Natural Language Processing

摘要

著录项

相似文献

相关主题

期刊订阅