Primitive printed Arabic Optical Character Recognition using statistical features

机译：使用统计功能的原始印刷阿拉伯文光学字符识别

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Due to the several forms of different Arabic font types, Arabic character recognition is still a challenge. Most literature works consider only one font per text what results in low recognition accuracy. This paper tends to enhance the accuracy of AOCR (Arabic Optical Character Recognition) by considering an automatic Optical Font Recognition (OFR) stage before going ahead with the traditional OCR stages. This has been achieved using SIFT (Scale Invariant Feature Transform) descriptors. First, a comparative study of four most recent algorithms of primitive OCR has been performed to evaluate the different features and classifiers utilized in their systems. Accordingly, a combining of statistical features have been proposed as well as selecting Random Forest Tree classifier for classification stage. The combination of the features are used to train the classifiers. As a result, each recognized text font is directed to a specific classifier tree. The proposed system was tested on a generated Primitive Arabic Characters Noise Free dataset (PAC-NF) containing 30000 samples. Experimental results achieved a promising character recognition accuracy of 99.8-100%.

机译：由于阿拉伯字体类型的几种形式，阿拉伯字符识别仍然是一个挑战。大多数文献作品只考虑每个文本一种字体，这会导致较低的识别精度。本文倾向于通过在进行传统OCR阶段之前考虑使用自动光学字体识别（OFR）阶段来提高AOCR（阿拉伯光学字符识别）的准确性。这是通过使用SIFT（尺度不变特征变换）描述符实现的。首先，对四种最新的原始OCR算法进行了比较研究，以评估其系统中使用的不同功能和分类器。因此，已经提出了统计特征的组合以及为分类阶段选择随机森林树分类器。功能的组合用于训练分类器。结果，每个识别的文本字体都被定向到特定的分类器树。在生成的包含30000个样本的原始阿拉伯字符无噪声数据集（PAC-NF）上测试了提议的系统。实验结果实现了令人满意的99.8-100％的字符识别精度。

著录项

来源
《2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems》|2015年|567-571|共5页
会议地点 Cairo(EG)
作者
Mohamed Dahi; Noura A. Semary; Mohiy M. Hadhoud;
展开▼
作者单位

Information Technology dept., Faculty of computers and information, Menofia university, Shebin El-Kom, Egypt;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
AOCR;

机译：AOCR;

相似文献

外文文献
中文文献
专利

1. Generalized Hough Transform for Arabic Printed Optical Character Recognition [J] . Maryam Madani, Shadpour Mallakpour The international arab journal of information technology . 2004,第4期

机译：阿拉伯数字印刷光学字符识别的广义霍夫变换
2. A Comparative Study between the Pseudo Zernike and Krawtchouk Invariants Moments for Printed Arabic Characters Recognition [J] . Rachid Salouan, Said Safi, Belaid Bouikhalene Journal of Emerging Technologies in Web Intelligence . 2014,第1期

机译：伪Zernike和Krawtchouk不变矩用于印刷阿拉伯字符识别的比较研究
3. An Empirical Evaluation of Off-line Arabic Handwriting And Printed Characters Recognition System [J] . Firoj Parwej International Journal of Computer Science Issues . 2012,第6期

机译：离线阿拉伯文字和印刷字符识别系统的实证评估
4. Primitive printed Arabic Optical Character Recognition using statistical features [C] . Mohamed Dahi, Noura A. Semary, Mohiy M. Hadhoud IEEE International Conference on Intelligent Computing and Information Systems . 2015

机译：原始印刷的阿拉伯光学字符识别使用统计特征
5. Optical Character Recognition of Printed Persian/Arabic Documents. [D] . Shafii, Mahnaz. 2014

机译：印刷的波斯/阿拉伯文档的光学字符识别。
6. Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research [O] . Laslo Dinges, Ayoub Al-Hamadi, Moftah Elzobi, 2016

机译：常用阿拉伯文字的合成对光学字符识别的研究
7. Arabic text recognition of printed manuscripts. Efficient recognition of off-line printed Arabic text using Hidden Markov Models, Bigram Statistical Language Model, and post-processing. [O] . Al-Muhtaseb Husni Abdulghani 2010

机译：印刷品的阿拉伯文字识别。使用隐马尔可夫模型，Bigram统计语言模型和后处理可有效识别离线印刷的阿拉伯文本。

Primitive printed Arabic Optical Character Recognition using statistical features

摘要

著录项

相似文献

相关主题

期刊订阅