...
首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >A novel framework for automatic sorting of postal documents with multi-script address blocks
【24h】

A novel framework for automatic sorting of postal documents with multi-script address blocks

机译:具有多脚本地址块的自动分类邮政文档的新颖框架

获取原文
获取原文并翻译 | 示例
           

摘要

Recognition of numeric postal codes in a multi-script environment is a classical problem in any postal automation system. In such postal documents, determination of the script of the handwritten postal codes is crucial for subsequent invocation of the digit recognizers for respective scripts. The current framework attempts to infer about the script of the numeric postal code without having any bias from the script of the textual address part of the rest of the address block, as they might differ in a potential multi-script environment. Scope of the current work is to recognize the postal codes written in any of the four popular scripts, viz., Latin, Devanagari, Bangla and Urdu. For this purpose, we first implement a Hough transformation based technique to localize the postal-code blocks from structured postal documents with defined address block region. Isolated handwritten digit patterns are then extracted from the localized postal-code region. In the next stage of the developed framework, similar shaped digit patterns of the said four scripts are grouped in 25 clusters. A script independent unified pattern classifier is then designed to classify the numeric postal codes into one of these 25 clusters. Based on these classification decisions a rule-based script inference engine is designed to infer about the script of the numeric postal code. One of the four script specific classifiers is subsequently invoked to recognize the digit patterns of the corresponding script. A novel quad-tree based image partitioning technique is also developed in this work for effective feature extraction from the numeric digit patterns. The average recognition accuracy over ten-fold cross validation of results for the support vector machine (SVM) based 25-class unified pattern classifier is obtained as 92.03%. With randomly selected six-digit numeric strings of four different scripts; an average of 96.72% script inference accuracy is achieved. The average of tenfold cross-validation recognition accuracies of the individual SVM classifiers for the Latin, Devanagari, Bangla and Urdu numerals are observed as 95.55%, 95.63%, 97.15% and 96.20%, respectively.
机译:在多脚本环境中识别数字邮政编码是任何邮政自动化系统中的经典问题。在这种邮政文件中,确定手写邮政编码的脚本对于随后调用各个脚本的数字识别器至关重要。当前框架试图推断数字邮政编码的脚本,而不会与地址块其余部分的文本地址部分的脚本有任何偏差,因为它们在潜在的多脚本环境中可能会有所不同。目前的工作范围是识别以四种流行文字中的任何一种书写的邮政编码,即拉丁语,梵文,孟加拉语和乌尔都语。为此,我们首先实现基于Hough变换的技术,以从具有定义的地址块区域的结构化邮政文档中定位邮政编码块。然后从本地邮政编码区域中提取孤立的手写数字模式。在开发框架的下一阶段,将上述四个脚本的相似形状的数字模式分组为25个簇。然后设计独立于脚本的统一模式分类器,以将数字邮政编码分类为这25个聚类之一。基于这些分类决策,设计了一个基于规则的脚本推断引擎,以推断数字邮政编码的脚本。随后调用四个特定于脚本的分类器之一,以识别相应脚本的数字模式。在这项工作中还开发了一种新颖的基于四叉树的图像分割技术,可以从数字模式中有效提取特征。基于支持向量机(SVM)的25类统一模式分类器的结果在十次交叉验证中的平均识别准确度为92.03%。具有四个不同脚本的随机选择的六位数字字符串;平均可以达到96.72%的脚本推断精度。拉丁文,梵文,孟加拉文和乌尔都文数字的各个SVM分类器的十倍交叉验证识别准确性的平均值分别为95.55%,95.63%,97.15%和96.20%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号