首页> 外文会议>2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems >Primitive printed Arabic Optical Character Recognition using statistical features
【24h】

Primitive printed Arabic Optical Character Recognition using statistical features

机译:使用统计功能的原始印刷阿拉伯文光学字符识别

获取原文
获取原文并翻译 | 示例

摘要

Due to the several forms of different Arabic font types, Arabic character recognition is still a challenge. Most literature works consider only one font per text what results in low recognition accuracy. This paper tends to enhance the accuracy of AOCR (Arabic Optical Character Recognition) by considering an automatic Optical Font Recognition (OFR) stage before going ahead with the traditional OCR stages. This has been achieved using SIFT (Scale Invariant Feature Transform) descriptors. First, a comparative study of four most recent algorithms of primitive OCR has been performed to evaluate the different features and classifiers utilized in their systems. Accordingly, a combining of statistical features have been proposed as well as selecting Random Forest Tree classifier for classification stage. The combination of the features are used to train the classifiers. As a result, each recognized text font is directed to a specific classifier tree. The proposed system was tested on a generated Primitive Arabic Characters Noise Free dataset (PAC-NF) containing 30000 samples. Experimental results achieved a promising character recognition accuracy of 99.8-100%.
机译:由于阿拉伯字体类型的几种形式,阿拉伯字符识别仍然是一个挑战。大多数文献作品只考虑每个文本一种字体,这会导致较低的识别精度。本文倾向于通过在进行传统OCR阶段之前考虑使用自动光学字体识别(OFR)阶段来提高AOCR(阿拉伯光学字符识别)的准确性。这是通过使用SIFT(尺度不变特征变换)描述符实现的。首先,对四种最新的原始OCR算法进行了比较研究,以评估其系统中使用的不同功能和分类器。因此,已经提出了统计特征的组合以及为分类阶段选择随机森林树分类器。功能的组合用于训练分类器。结果,每个识别的文本字体都被定向到特定的分类器树。在生成的包含30000个样本的原始阿拉伯字符无噪声数据集(PAC-NF)上测试了提议的系统。实验结果实现了令人满意的99.8-100%的字符识别精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号