...
首页> 外文期刊>Journal of intelligent & fuzzy systems: Applications in Engineering and Technology >An unsupervised lower-baseline localization method based on writing style features for historical documents
【24h】

An unsupervised lower-baseline localization method based on writing style features for historical documents

机译:基于历史文档的写作风格特征的无监督的下基线定位方法

获取原文
获取原文并翻译 | 示例
           

摘要

There is a lot of cultural heritage information in historical documents that have not been explored or exploited yet. Lower-Baseline Localization (LBL) is the first step in information retrieval from images of manuscripts where groups of handwritten text lines representing a message are identified. An LBL method is described depending on how the features of the writing style of an author are treated: the character shape and size, gap between characters and between lines, the shape of ascendant and descendant strokes, character body, space between characters, words and columns, and touching and overlapping lines. For example, most of the supervised LBL methods only analyze the gap between characters as part of the preprocessing phase of the document and the rest of features of the writing style of the author are left for the learning phase of the classifier. For such reason, supervised LBL methods tend to learn particular styles and collections. This paper presents an unsupervised LBL method that explicit analyses all the features of the writing style of the author and processes the document by windows. In this sense, the proposed method is more independent from the writing style of the author, and it is more reliable with new collections in real scenarios. According to the experimentation, the proposed method surpasses the state-of-the-art methods with the standard READ-BAD historical collection with 2,036 manuscripts and 132,124 manually annotated baselines from 9 libraries in 500 years.
机译:历史文献中有很多文化遗产信息尚未探索或剥削。下基线定位(LBL)是从识别表示消息的手写文本线组的稿件图像中检索的第一步。根据如何处理作者的写作风格的特征:角色形状和大小,字符之间的差距以及线之间的差距,角色,字符,字符之间的空间,字符,单词和字符之间的空间之间的字符形状和大小列,以及触摸和重叠线。例如,大多数监督的LBL方法仅将字符之间的间隙分析为文档的预处理阶段的一部分,并且作者写作风格的其余特征留给了分类器的学习阶段。出于这种原因,监督LBL方法倾向于学习特定风格和集合。本文介绍了一个无人监督的LBL方法,即显式分析作者写作风格的所有功能,并通过Windows处理文档。从这个意义上讲,所提出的方法更独立于作者的写作风格,并且在真实方案中的新集合更可靠。根据实验,所提出的方法超越了最先进的方法,其中标准的读取不良历史收集,500年来,来自9个图书馆的2,036个稿件和132,124个手动带注释的基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号