首页> 外文会议>International conference on Asian-Pacific digital libraries >Large-Scale Experiments for Mathematical Document Classification
【24h】

Large-Scale Experiments for Mathematical Document Classification

机译:数学文档分类的大型实验

获取原文

摘要

The ever increasing amount of digitally available information is curse and blessing at the same time. On the one hand, users have increasingly large amounts of information at their fingertips. On the other hand, the assessment and refinement of web search results becomes more and more tiresome and difficult for non-experts in a domain. Therefore, established digital libraries offer specialized collections with a certain degree of quality. This quality can largely be attributed to the great effort invested into semantic enrichment of the provided documents e.g. by annotating their documents with respect to a domain-specific taxonomy. This process is still done manually in many domains, e.g. chemistry (CAS), medicine (MeSH), or mathematics (MSC). But due to the growing amount of data, this manual task gets more and more time consuming and expensive. The only solution for this problem seems to employ automated classification algorithms, but from evaluations done in previous research, conclusions to a real world scenario are difficult to make. We therefore conducted a large scale feasibility study on a real world data set from one of the biggest mathematical digital libraries, i.e. Zentralblatt MATH, with special focus on its practical applicability.
机译:越来越多的数字可用信息同时是诅咒和祝福。一方面,用户触手可及的信息量越来越大。另一方面,对于领域内的非专家而言,网络搜索结果的评估和优化变得越来越烦人和困难。因此,已建立的数字图书馆可提供具有一定质量的专业馆藏。这种质量在很大程度上可以归因于对所提供文档的语义丰富化(例如,文档的语义化)投入了大量的精力。通过针对特定领域的分类法注释其文档。在许多域中,例如,仍然需要手动完成此过程化学(CAS),医学(MeSH)或数学(MSC)。但是由于数据量的增加,此手动任务变得越来越耗时且昂贵。解决此问题的唯一方法似乎是采用自动分类算法,但是根据先前研究的评估结果,很难得出现实情况的结论。因此,我们对来自最大的数学数字图书馆之一Zentralblatt MATH的真实世界数据集进行了大规模可行性研究,并特别关注其实际适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号