Recognizing Japanese Historical Cursive with Pseudo-Labeling-aided CRNN as an Application of Semi-Supervised Learning to Sequence Labeling

机译：伪标签辅助的CRNN识别日本历史草书作为半监督学习在序列标签中的应用

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Pseudo-labeling is semi-supervised learning to learn unlabeled data as well as labeled data by predicting labels of unlabeled data. As far as we know, pseudo-labeling is applied to the task to predict a category label. In this paper, we apply pseudo-labeling to sequence labeling which is a task to predict a sequence of labels for sequential data such as texts. To predict the pseudo-label of unlabeled data, we first initialize a representative sequence with one of the sequences inferred by multiple instances of a deep neural network by measuring edit distances among them. Then, to make the sequence more natural, the representative is refined by a local search that aims to minimize a metric defined as a linear combination of perplexity and average edit distance. To show the effectiveness of our method, we focus on recognizing Japanese historical cursive. Experimental results on ALCON2017 and Kaggle competition show that our method outperformed most of the prior works. Our method reduced the character error rate (CER) at most around 10%.

机译：伪标记是半监督学习，通过预测未标记数据的标记来学习未标记数据以及已标记数据。据我们所知，伪标签应用于任务以预测类别标签。在本文中，我们将伪标签应用于序列标签，这是一项预测序列数据（例如文本）的标签序列的任务。为了预测未标记数据的伪标记，我们首先通过测量深度神经网络的多个实例之间的编辑距离，使用由深度神经网络的多个实例推断出的序列之一来初始化代表序列。然后，为了使序列更自然，可以通过局部搜索来优化代表，该搜索旨在最小化定义为困惑度和平均编辑距离的线性组合的度量。为了展示我们方法的有效性，我们着重于认识日本的历史草书。在ALCON2017和Kaggle竞赛中的实验结果表明，我们的方法优于大多数以前的工作。我们的方法最多可将字符错误率（CER）降低约10％。

著录项

来源
《International Conference on Frontiers in Handwriting Recognition》|2020年|97-102|共6页
会议地点
作者
Ayumu Nagai;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Labeling; Measurement; Text recognition; Task analysis; Prediction algorithms; Neural networks; Semisupervised learning;

机译：标签;测量;文本识别;任务分析;预测算法;神经网络;半监督学习;

相似文献

外文文献
中文文献
专利

1. A two-phase hybrid of semi-supervised and active learning approach for sequence labeling [J] . Hamed Hassanzadeh, Mohammadreza Keyvanpour Intelligent data analysis . 2013,第2期

机译：半监督和主动学习两阶段混合的序列标记方法
2. Semi-supervised multi-label feature learning via label enlarged discriminant analysis [J] . Knowledge and information systems . 2020,第6期

机译：半监督多标签特征通过标签扩大判别分析学习
3. Robust Label Prediction via Label Propagation and Geodesic k-Nearest Neighbor in Online Semi-Supervised Learning [J] . Yuichiro WADA, Siqiang SU, Wataru KUMAGAI, IEICE transactions on information and systems . 2019,第8期

机译：在线半监督学习中通过标签传播和测地线 k -最近邻来进行可靠的标签预测
4. On the Improvement of Recognizing Single-Line Strings of Japanese Historical Cursive [C] . Ayumu Nagai International Conference on Document Analysis and Recognition . 2019

机译：日本历史草书单行字符串识别的改进
5. Learning from partially labeled data: Unsupervised and semi-supervised learning on graphs and learning with distribution shifting. [D] . Huang, Jiayuan. 2007

机译：从部分标记的数据中学习：在图上进行无监督和半监督学习，并通过分布转移进行学习。
6. Integrating Semi-supervised and Supervised Learning Methods for Label Fusion in Multi-Atlas Based Image Segmentation [O] . Qiang Zheng, Yihong Wu, Yong Fan 2018

机译：基于多图集的图像分割中标签融合的半监督与监督学习方法的集成
7. Evaluating retraining rules for semi-supervised learning in neural network based cursive word recognition [O] . Frinken Volkmar, Bunke Horst 2009

机译：基于神经网络的草书单词识别评估半监督学习的再训练规则

Recognizing Japanese Historical Cursive with Pseudo-Labeling-aided CRNN as an Application of Semi-Supervised Learning to Sequence Labeling

摘要

著录项

相似文献

相关主题

期刊订阅