Large-Scale Experiments for Mathematical Document Classification

机译：数学文档分类的大型实验

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The ever increasing amount of digitally available information is curse and blessing at the same time. On the one hand, users have increasingly large amounts of information at their fingertips. On the other hand, the assessment and refinement of web search results becomes more and more tiresome and difficult for non-experts in a domain. Therefore, established digital libraries offer specialized collections with a certain degree of quality. This quality can largely be attributed to the great effort invested into semantic enrichment of the provided documents e.g. by annotating their documents with respect to a domain-specific taxonomy. This process is still done manually in many domains, e.g. chemistry (CAS), medicine (MeSH), or mathematics (MSC). But due to the growing amount of data, this manual task gets more and more time consuming and expensive. The only solution for this problem seems to employ automated classification algorithms, but from evaluations done in previous research, conclusions to a real world scenario are difficult to make. We therefore conducted a large scale feasibility study on a real world data set from one of the biggest mathematical digital libraries, i.e. Zentralblatt MATH, with special focus on its practical applicability.

机译：越来越多的数字可用信息同时是诅咒和祝福。一方面，用户触手可及的信息量越来越大。另一方面，对于领域内的非专家而言，网络搜索结果的评估和优化变得越来越烦人和困难。因此，已建立的数字图书馆可提供具有一定质量的专业馆藏。这种质量在很大程度上可以归因于对所提供文档的语义丰富化（例如，文档的语义化）投入了大量的精力。通过针对特定领域的分类法注释其文档。在许多域中，例如，仍然需要手动完成此过程化学（CAS），医学（MeSH）或数学（MSC）。但是由于数据量的增加，此手动任务变得越来越耗时且昂贵。解决此问题的唯一方法似乎是采用自动分类算法，但是根据先前研究的评估结果，很难得出现实情况的结论。因此，我们对来自最大的数学数字图书馆之一Zentralblatt MATH的真实世界数据集进行了大规模可行性研究，并特别关注其实际适用性。

著录项

来源
《International conference on Asian-Pacific digital libraries》|2013年|83-92|共10页
会议地点
作者
Simon Barthel; Sascha Toennies; Wolf-Tilo Balke;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Text Classification; Mathematical Documents; Experiments;

机译：文字分类;数学文件;实验;

相似文献

外文文献
中文文献
专利

1. 一种基于粗糙集角分类神经网络的文档分类方法 [J] . 张卫丰, 徐宝文, 崔自峰, 东南大学学报（英文版） . 2006,第003期
2. SRTM: a supervised relation topic model for multi-classification on large-scale document network [J] . Li Chunshan, Zhang Hua, Chu Dianhui, Neural computing & applications . 2020,第10期

机译：SRTM：大规模文档网络多分类的监督关系主题模型
3. Text Filtering for Harmful Document Classification Using Three-Word Co-Occurrence and Large-Scale Data Processing [J] . TAKANOBU OTSUKA, DEYUE DENG, TAKAYUKI ITO Electronics and communications in Japan . 2015,第10期

机译：使用三字共现和大规模数据处理对有害文档分类进行文本过滤
4. Large-scale document image retrieval and classification with runlength histograms and binary embeddings [J] . Gordo A., Perronnin F., Valveny E. Pattern Recognition: The Journal of the Pattern Recognition Society . 2013,第7期

机译：具有游程直方图和二进制嵌入的大规模文档图像检索和分类
5. Large-Scale Experiments for Mathematical Document Classification [C] . Simon Barthel, Sascha Toennies, Wolf-Tilo Balke International conference on Asian-Pacific digital libraries . 2013

机译：数学文件分类的大型实验
6. Mathematical concepts and practice: A comparative analysis of grade one Thai mathematics curriculum documents with the NCTM curriculum standards of the United States. [D] . Tusgate, Yawarat. 1996

机译：数学概念和实践：比较泰国一年级数学课程文件和美国NCTM课程标准。
7. A deep learning based method for large-scale classification registration and clustering of in-situ hybridization experiments in the mouse olfactory bulb [O] . Alexander Andonian, Daniel Paseltiner, Travis J. Gould, -1

机译：基于深度学习的鼠标嗅球中原位杂交实验的大规模分类配准和聚类的方法
8. Comparing hierarchical mathematical document clustering against the Mathematics Subject Classification tree [O] . Kuśmierczyk, Tomasz, Łukasik, Michał, Bolikowski, Łukasz, 2013

机译：将分层数学文档聚类与数学学科分类树进行比较

Large-Scale Experiments for Mathematical Document Classification

摘要

著录项

相似文献

相关主题

期刊订阅