An unsupervised lower-baseline localization method based on writing style features for historical documents

Garcia-Calderon Miguel Angel; Garcia-Hernandez Rene Arnulfo; Ledeneva Yulia

首页> 外文期刊>Journal of intelligent & fuzzy systems: Applications in Engineering and Technology >An unsupervised lower-baseline localization method based on writing style features for historical documents

【24h】

An unsupervised lower-baseline localization method based on writing style features for historical documents

机译：基于历史文档的写作风格特征的无监督的下基线定位方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

There is a lot of cultural heritage information in historical documents that have not been explored or exploited yet. Lower-Baseline Localization (LBL) is the first step in information retrieval from images of manuscripts where groups of handwritten text lines representing a message are identified. An LBL method is described depending on how the features of the writing style of an author are treated: the character shape and size, gap between characters and between lines, the shape of ascendant and descendant strokes, character body, space between characters, words and columns, and touching and overlapping lines. For example, most of the supervised LBL methods only analyze the gap between characters as part of the preprocessing phase of the document and the rest of features of the writing style of the author are left for the learning phase of the classifier. For such reason, supervised LBL methods tend to learn particular styles and collections. This paper presents an unsupervised LBL method that explicit analyses all the features of the writing style of the author and processes the document by windows. In this sense, the proposed method is more independent from the writing style of the author, and it is more reliable with new collections in real scenarios. According to the experimentation, the proposed method surpasses the state-of-the-art methods with the standard READ-BAD historical collection with 2,036 manuscripts and 132,124 manually annotated baselines from 9 libraries in 500 years.

机译：历史文献中有很多文化遗产信息尚未探索或剥削。下基线定位（LBL）是从识别表示消息的手写文本线组的稿件图像中检索的第一步。根据如何处理作者的写作风格的特征：角色形状和大小，字符之间的差距以及线之间的差距，角色，字符，字符之间的空间，字符，单词和字符之间的空间之间的字符形状和大小列，以及触摸和重叠线。例如，大多数监督的LBL方法仅将字符之间的间隙分析为文档的预处理阶段的一部分，并且作者写作风格的其余特征留给了分类器的学习阶段。出于这种原因，监督LBL方法倾向于学习特定风格和集合。本文介绍了一个无人监督的LBL方法，即显式分析作者写作风格的所有功能，并通过Windows处理文档。从这个意义上讲，所提出的方法更独立于作者的写作风格，并且在真实方案中的新集合更可靠。根据实验，所提出的方法超越了最先进的方法，其中标准的读取不良历史收集，500年来，来自9个图书馆的2,036个稿件和132,124个手动带注释的基线。

著录项

来源
《Journal of intelligent & fuzzy systems: Applications in Engineering and Technology》 |2020年第2期|共12页
作者
Garcia-Calderon Miguel Angel; Garcia-Hernandez Rene Arnulfo; Ledeneva Yulia;
展开▼
作者单位

Autonomous Univ State Mexico Literary Inst 100 Toluca 50000 State Of Mexico Mexico;

Autonomous Univ State Mexico Literary Inst 100 Toluca 50000 State Of Mexico Mexico;

Autonomous Univ State Mexico Literary Inst 100 Toluca 50000 State Of Mexico Mexico;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化系统;
关键词
Lower-baseline localization; historical document analysis; text line segmentation; writing style features;

机译：低基线本地化;历史文献分析;文本线分割;写样式功能;

相似文献

外文文献
中文文献
专利

1. An unsupervised lower-baseline localization method based on writing style features for historical documents [J] . Garcia-Calderon Miguel Angel, Garcia-Hernandez Rene Arnulfo, Ledeneva Yulia Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2020,第2Pta2期

机译：基于历史文档的写作风格特征的无监督的下基线定位方法
2. Unsupervised-learning-based keyphrase extraction from a single document by the effective combination of the graph-based model and the modified C-value method [J] . Yeom Hongseon, Ko Youngjoong, Seo Jungyun Computer speech and language . 2019,第NOVa期

机译：通过有效结合基于图的模型和改进的C值方法从单个文档中提取基于无监督学习的关键字
3. Unsupervised-learning-based keyphrase extraction from a single document by the effective combination of the graph-based model and the modified C-value method [J] . Yeom Hongseon, Ko Youngjoong, Seo Jungyun Computer speech and language . 2019,第Nova期

机译：通过基于图形的模型的有效组合和改进的C值方法的无监督学习的基于学习的关键词提取
4. Page Segmentation for Historical Document Images Based on Superpixel Classification with Unsupervised Feature Learning [C] . Kai Chen, Cheng-Lin Liu, Mathias Seuret, IAPR International Workshop on Document Analysis Systems . 2016

机译：基于超像素分类和无监督特征学习的历史文献图像页面分割
5. Localized feature selection for unsupervised learning. [D] . Li, Yuanhong. 2010

机译：本地化特征选择，实现无监督学习。
6. Comparing writing style feature-based classification methods for estimating user reputations in social media [O] . Jong Hwan Suh -1

机译：比较基于写作风格特征的分类方法以估计社交媒体中的用户声誉
7. Feature-extraction methods for historical manuscript dating based on writing style development [O] . Maruf A. Dhali, Camilo Nathan Jansen, Jan Willem de Wit, 2020

机译：基于写作风格开发的历史稿稿的特征提取方法

An unsupervised lower-baseline localization method based on writing style features for historical documents

摘要

著录项

相似文献

相关主题

期刊订阅