首页> 美国卫生研究院文献>PLoS Clinical Trials >Understanding disciplinary vocabularies using a full-text enabled domain-independent term extraction approach
【2h】

Understanding disciplinary vocabularies using a full-text enabled domain-independent term extraction approach

机译:使用启用了全文本的领域独立术语提取方法来理解学科词汇

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Publication metadata help deliver rich analyses of scholarly communication. However, research concepts and ideas are more effectively expressed through unstructured fields such as full texts. Thus, the goals of this paper are to employ a full-text enabled method to extract terms relevant to disciplinary vocabularies, and through them, to understand the relationships between disciplines. This paper uses an efficient, domain-independent term extraction method to extract disciplinary vocabularies from a large multidisciplinary corpus of PLoS ONE publications. It finds a power-law pattern in the frequency distributions of terms present in each discipline, indicating a semantic richness potentially sufficient for further study and advanced analysis. The salient relationships amongst these vocabularies become apparent in application of a principal component analysis. For example, Mathematics and Computer and Information Sciences were found to have similar vocabulary use patterns along with Engineering and Physics; while Chemistry and the Social Sciences were found to exhibit contrasting vocabulary use patterns along with the Earth Sciences and Chemistry. These results have implications to studies of scholarly communication as scholars attempt to identify the epistemological cultures of disciplines, and as a full text-based methodology could lead to machine learning applications in the automated classification of scholarly work according to disciplinary vocabularies.
机译:出版物元数据有助于对学术交流进行丰富的分析。但是,通过非结构化领域(如全文)可以更有效地表达研究概念和想法。因此,本文的目标是采用全文本方法来提取与学科词汇相关的术语,并通过它们来理解学科之间的关系。本文使用一种有效的,独立于域的术语提取方法从PLoS ONE出版物的大型多学科语料库中提取学科词汇。它在每个学科中存在的术语频率分布中找到幂律模式,表明语义丰富性可能足以用于进一步的研究和高级分析。这些词汇之间的显着关系在主成分分析的应用中变得显而易见。例如,发现数学,计算机和信息科学与工程和物理具有相似的词汇使用模式;而化学和社会科学与地球科学和化学一起发现了相反的词汇使用方式。这些结果对学者交流的研究产生了影响,因为学者们试图确定学科的认识论文化,而基于全文的方法论可能导致机器学习在根据学科词汇对学术作品进行自动分类中的应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号