Text Detection in Document Images by Machine Learning Algorithms

机译：机器学习算法文档图像中的文本检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the proposed paper, we consider a problem of text detection in document images. This problem plays an important role in OCR systems and is a challenging task. In the first step of our proposed text detection approach, we use a self-adjusting bottom-up segmentation algorithm to segment a document image into a set of connected components (CCs). The segmentation algorithm is based on the Sobel edge detection method. In the second step, CCs are described in terms of 27 features and a machine learning algorithm is then used to classify the CCs as text or nontext. For testing the approach, we have collected a dataset (ASTRoID), which contains 500 images of text blocks and 500 images of nontext blocks. We empirically compare performance of the proposed text detection method when using seven different machine learning algorithms.

机译：在拟议论文中，我们考虑文档图像中的文本检测问题。这个问题在OCR系统中发挥着重要作用，并且是一个具有挑战性的任务。在我们提出的文本检测方法的第一步中，我们使用自调整自下而上的分段算法将文档映像分段为一组连接的组件（CCS）。分段算法基于Sobel边缘检测方法。在第二步中，根据27个功能描述了CC，然后使用机器学习算法将CCS作为文本或非文本分类。为了测试方法，我们收集了数据集（Astrop），其中包含500个文本块图像和500图像的非文本块。我们在使用七种不同机器学习算法时凭经验比较所提出的文本检测方法的性能。

著录项

来源
《International Conference on Computer Recognition Systems》|2016年||共11页
会议地点
作者
Darko Zelenika; Janez Povh; Bernard ?enko;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.44-532;
关键词
Text detection; Document segmentation; Text/nontext classification; Machine learning;

机译：文本检测;文档分割;文本/非文本分类;机器学习;

相似文献

外文文献
中文文献
专利

1. Text Document Categorization using Machine Learning Algorithm in Agricultural Domain [J] . Sreekumar Biswas, Rajni Jain Journal of the Indian Society of Agricultural Statistics . 2018,第1期

机译：用农业域中机器学习算法进行文本文档分类
2. Review of Text Extraction Algorithms for Scene-text and Document Images [J] . Sahare Parul, Dhok Sanjay B. IETE Technical Review . 2017,第2期

机译：场景文本和文档图像的文本提取算法综述
3. Text/Image Region Separation for Document Layout Detection of Old Document Images Using Non-linear Diffusion and Level Set [J] . S. Sachin Kumar, Parvathy Rajendran, P. Prabaharan, Procedia Computer Science . 2016,第1期

机译：文本/图像区域分离，用于使用非线性扩散和水平集的旧文档图像的文档布局检测
4. Text Detection in Document Images by Machine Learning Algorithms [C] . Darko Zelenika, Janez Povh, Bernard ?enko International Conference on Computer Recognition Systems . 2016

机译：机器学习算法文档图像中的文本检测
5. Using Citizen Scientists to Inform Machine Learning Algorithms to Automate the Detection of Species in Ecological Imagery [D] . Mattingly, Marshall Paul, III. 2018

机译：使用公民科学家通知机器学习算法以自动检测生态图像中的物种
6. Automated Detection of P. falciparum Using Machine Learning Algorithms with Quantitative Phase Images of Unstained Cells [O] . Han Sang Park, Matthew T. Rinehart, Katelyn A. Walzer, -1

机译：使用机器学习算法对未染色细胞的定量相位图像进行自动检测恶性疟原虫
7. A Review of Machine Learning Algorithms for Text-Documents Classification [O] . Khairullah Khan, Lam Hong Lee, Baharum Baharudin 2010

机译：文本文件分类的机器学习算法综述

Text Detection in Document Images by Machine Learning Algorithms

摘要

著录项

相似文献

相关主题

期刊订阅