首页> 外文会议>International Conference on Frontiers in Handwriting Recognition >Recognizing Japanese Historical Cursive with Pseudo-Labeling-aided CRNN as an Application of Semi-Supervised Learning to Sequence Labeling
【24h】

Recognizing Japanese Historical Cursive with Pseudo-Labeling-aided CRNN as an Application of Semi-Supervised Learning to Sequence Labeling

机译:伪标签辅助的CRNN识别日本历史草书作为半监督学习在序列标签中的应用

获取原文

摘要

Pseudo-labeling is semi-supervised learning to learn unlabeled data as well as labeled data by predicting labels of unlabeled data. As far as we know, pseudo-labeling is applied to the task to predict a category label. In this paper, we apply pseudo-labeling to sequence labeling which is a task to predict a sequence of labels for sequential data such as texts. To predict the pseudo-label of unlabeled data, we first initialize a representative sequence with one of the sequences inferred by multiple instances of a deep neural network by measuring edit distances among them. Then, to make the sequence more natural, the representative is refined by a local search that aims to minimize a metric defined as a linear combination of perplexity and average edit distance. To show the effectiveness of our method, we focus on recognizing Japanese historical cursive. Experimental results on ALCON2017 and Kaggle competition show that our method outperformed most of the prior works. Our method reduced the character error rate (CER) at most around 10%.
机译:伪标记是半监督学习,通过预测未标记数据的标记来学习未标记数据以及已标记数据。据我们所知,伪标签应用于任务以预测类别标签。在本文中,我们将伪标签应用于序列标签,这是一项预测序列数据(例如文本)的标签序列的任务。为了预测未标记数据的伪标记,我们首先通过测量深度神经网络的多个实例之间的编辑距离,使用由深度神经网络的多个实例推断出的序列之一来初始化代表序列。然后,为了使序列更自然,可以通过局部搜索来优化代表,该搜索旨在最小化定义为困惑度和平均编辑距离的线性组合的度量。为了展示我们方法的有效性,我们着重于认识日本的历史草书。在ALCON2017和Kaggle竞赛中的实验结果表明,我们的方法优于大多数以前的工作。我们的方法最多可将字符错误率(CER)降低约10%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号