A preprocessing method for printed Tamil documents: Skew correction and textual classification

机译：印刷泰米尔语文件的预处理方法：偏斜校正和文本分类

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

An Optical character recognition (OCR) consists of the phases: preprocessing and segmentation, feature extraction, classification and post-processing. This paper focuses on preprocessing and segmentation tasks which plays a major role in the subsequent processes of an OCR. The objective of preprocessing and segmentation is to improve the quality of the input image. In addition this phase removes unnecessary portions of the input image that would otherwise complicate the subsequent steps of OCR and reduce the overall recognition rate. Preprocessing and segmentation step consists many sub processes namely, image binarisation, noise removal, skew detection and correction, page segmentation, text or non-text classification, line segmentation, word segmentation and character segmentation. This paper proposes a new method to calculate the skew angle for skew correction. In addition this paper proposes a more accurate method to segment the input image as blocks and classify the blocks as text or non-text. The skew angle is calculated on the scanned document using Wiener filter, smearing technique and Radon transform. Document image is segmented into blocks using run length smearing algorithm and connected component analysis. Features such as basic, density and HOG are extracted from each block for text and non-text classification. The proposed methods are tested on 54 documents. The testing results show a recognition rate of 96.30% for skew detection and correction whereas the recognition rate is 99.18% for text or non-text classification with binary SVMs using RBF kernel.

机译：光学字符识别（OCR）包括以下阶段：预处理和分段，特征提取，分类和后处理。本文重点介绍在OCR后续过程中起主要作用的预处理和分段任务。预处理和分割的目的是提高输入图像的质量。另外，该阶段去除了输入图像中不必要的部分，这些部分否则会使OCR的后续步骤复杂化并降低总体识别率。预处理和分段步骤包括许多子过程，即图像二值化，噪声消除，倾斜检测和校正，页面分段，文本或非文本分类，行分段，单词分段和字符分段。本文提出了一种计算偏斜角的新方法，以进行偏斜校正。另外，本文提出了一种更准确的方法来将输入图像分割为块并将块分类为文本或非文本。使用Wiener滤镜，拖尾技术和Radon变换在扫描的文档上计算偏斜角。使用行程拖尾涂抹算法和连接的组件分析将文档图像分割成块。从每个块中提取诸如基本，密度和HOG之类的功能，以进行文本和非文本分类。建议的方法在54个文档上进行了测试。测试结果显示，倾斜检测和纠正的识别率为96.30％，而使用RBF内核的二进制SVM对文本或非文本分类的识别率为99.18％。

著录项

来源
《2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems》|2015年|495-500|共6页
会议地点 Cairo(EG)
作者
M. Ramanan; A. Ramanan; E. Y. A. Charles;
展开▼
作者单位

Department of Computer Science, Trincomalee Campus, Eastern University, Sri Lanka;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Printed Tamil documents; Skew correction; Textual classification;

机译：泰米尔语印刷文档；斜度校正；文本分类;

相似文献

外文文献
中文文献
专利

1. Methodology for fast skew angle detection and correction by using the Wavelet and the Hough transform: Application for Arabic printed documents [J] . N. KHORISSI, F. ABDAT, A. MELLIT, WSEAS Transactions on Signal Processing . 2007,第7期

机译：使用小波和霍夫变换进行快速偏斜角检测和校正的方法：阿拉伯文印刷文档的应用
2. Language Independent Skew Detection and Correction of Printed Text Document Images: A Non-rotational Approach [J] . S. Murali, G. Hemanthkumar, P. Nagabhushan Vivek . 2006,第2期

机译：与语言无关的偏斜检测和打印文本文档图像校正：一种非旋转方法
3. Skew detection and block classification of printed documents [J] . P.-Y. Yin Image and Vision Computing . 2001,第8期

机译：倾斜检测和打印文件的块分类
4. A preprocessing method for printed Tamil documents: Skew correction and textual classification [C] . M. Ramanan, A. Ramanan, E. Y. A. Charles IEEE International Conference on Intelligent Computing and Information Systems . 2015

机译：印刷泰米尔文档的预处理方法：歪斜校正和文本分类
5. Probabilistic random field based method for annotated machine printed documents preprocessing [D] . Peng, Xujun 2011

机译：基于概率随机场的带注释机器打印文档预处理方法
6. Correction: A Method of Neighbor Classes Based SVM Classification for Optical Printed Chinese Character Recognition [O] . Jie Zhang, Xiaohong Wu, Yanmei Yu, -1

机译：校正：一种基于邻类的支持向量机分类的光学印刷汉字识别方法
7. Image Segmentation and Multiple skew estimation, correction in printed and handwritten documents [O] . Gumpalli Sai Prasanth, Kandipalli Prasanth 2014

机译：图像分割和多重偏斜估计，打印和手写文档中的校正

A preprocessing method for printed Tamil documents: Skew correction and textual classification

摘要

著录项

相似文献

相关主题

期刊订阅