首页> 外文期刊>ACM transactions on Asian language information processing >Bangla Handwritten Character Segmentation Using Structural Features: A Supervised and Bootstrapping Approach
【24h】

Bangla Handwritten Character Segmentation Using Structural Features: A Supervised and Bootstrapping Approach

机译:使用结构特征的孟加拉手写字符分割:一种监督和引导方法

获取原文
获取原文并翻译 | 示例
           

摘要

In this article, we propose a new framework for segmentation of Bangla handwritten word images into meaningful individual symbols or pseudo-characters. Existing segmentation algorithms are not usually treated as a classification problem. However, in the present study, the segmentation algorithm is looked upon as a two-class supervised classification problem. The method employs an SVM classifier to select the segmentation points on the word image on the basis of various structural features. For training of the SVM classifier, an unannotated training set is prepared first using candidate segmenting points. The training set is then clustered, and each cluster is labeled manually with minimal manual intervention. A semi-automatic bootstrapping technique is also employed to enlarge the training set from new samples. The overall architecture describes a basic step toward building an annotation system for the segmentation problem, which has not so far been investigated. The experimental results show that our segmentation method is quite efficient in segmenting not only word images but also handwritten texts. As a part of this work, a database of Bangla handwritten word images has also been developed. Considering our data collection method and a statistical analysis of our lexicon set, we claim that the relevant characteristics of an ideal lexicon set are present in our handwritten word image database.
机译:在本文中,我们提出了一个新的框架,用于将孟加拉语手写单词图像分割为有意义的单个符号或伪字符。现有的分割算法通常不被视为分类问题。然而,在本研究中,分割算法被视为两类监督分类问题。该方法采用SVM分类器以基于各种结构特征来选择单词图像上的分割点。为了训练SVM分类器,首先使用候选分割点准备未注释的训练集。然后对训练集进行聚类,并以最少的人工干预手动标记每个聚类。半自动引导技术也被用来扩大新样本的训练集。总体架构描述了构建针对分割问题的注释系统的基本步骤,到目前为止尚未进行研究。实验结果表明,我们的分割方法不仅可以有效分割单词图像,还可以有效分割手写文本。作为这项工作的一部分,还开发了孟加拉语手写文字图像数据库。考虑到我们的数据收集方法和词典集的统计分析,我们认为理想单词集的相关特征存在于我们的手写文字图像数据库中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号