Robust Text Extraction for Automated Processing of Multi-Lingual Personal Identity Documents

Pushpa B R; Ashwin M; Vivek K R

首页> 外文期刊>International Journal of Engineering and Technology >Robust Text Extraction for Automated Processing of Multi-Lingual Personal Identity Documents

【24h】

Robust Text Extraction for Automated Processing of Multi-Lingual Personal Identity Documents

机译：强大的文本提取功能，可自动处理多语言个人身份文档

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text extraction is a technique to extract the textual portion from non-textual background like images. It plays an important role in deciphering valuable information from images. Variation in text size, font, orientation, alignment, contrast etc. makes the task of text extraction challenging. Existing text extraction methods focus on certain regions of interest and address characteristics like noise, blur, distortion and variations in fonts makes text extraction difficult. This paper proposes a technique to extract textual characters from scanned personal identity document images. Current procedures keep track of user records manually and thus give way to inefficient practices and need for abundant time and human resources. The proposed methodology digitizes personal identity documents and eliminates the need for a large portion of the manual work involved in existing data entry and verification procedures. The proposed method has been experimented extensively with large datasets of varying sizes and image qualities. The results obtained indicate high accuracy in the extraction of important textual features from the document images.

机译：文本提取是一种从非文本背景（如图像）提取文本部分的技术。它在从图像中解密有价值的信息中起着重要作用。文本大小，字体，方向，对齐方式，对比度等方面的变化使文本提取的任务具有挑战性。现有的文本提取方法集中于某些感兴趣的区域，并且诸如字体的噪声，模糊，失真和变化等地址特征使文本提取变得困难。本文提出了一种从扫描的个人身份证件图像中提取文本字符的技术。当前的程序手动跟踪用户记录，因此让步给效率低下的做法，并且需要大量的时间和人力资源。所提出的方法将个人身份证件数字化，从而消除了现有数据输入和验证程序中涉及的大部分手工工作。所提出的方法已经在具有不同大小和图像质量的大型数据集上进行了广泛的实验。获得的结果表明从文档图像中提取重要文本特征的准确性很高。

著录项

来源
《International Journal of Engineering and Technology》 |2016年第2期|共9页
作者
Pushpa B R; Ashwin M; Vivek K R;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类工业技术;
关键词

相似文献

外文文献
中文文献
专利

1. Automating the generation of lexical patterns for processing free text in clinical documents [J] . Meng Frank, Morioka Craig Journal of the American Medical Informatics Association : . 2015,第5期

机译：自动生成词汇模式以处理临床文档中的自由文本
2. Automating Stroke Data Extraction From Free-Text Radiology Reports Using Natural Language Processing: Instrument Validation Study [J] . Amy Y X Yu, Zhongyu A Liu, Chloe Pou-Prom, JMIR Medical Informatics . 2021,第5期

机译：自动语言处理自由文本放射学报告自动化冲程数据提取：仪器验证研究
3. Automated extraction of information from Polish resume documents in the IT recruitment process [J] . Agnieszka Wosiak Procedia Computer Science . 2021,第a期

机译：在IT招聘过程中自动提取来自波兰语简历文件的信息
4. Robust Text Extraction in Images for Personal Event Planner [C] . Lavanya Bhaskar, R. Ranjith International Conference on Computing, Communication and Networking Technologies . 2020

机译：用于个人事件策划者的图像中可靠的文本提取
5. Noun phrases in documents: Preprocessing, automatic extraction, and statistical analysis in different categories of text. [D] . Kim, Youngin. 2002

机译：文档中的名词短语：对不同类别的文本进行预处理，自动提取和统计分析。
6. Automating the generation of lexical patterns for processing free text in clinical documents [O] . Frank Meng, Craig Morioka 2015

机译：自动生成词汇模式以处理临床文档中的自由文本
7. A robust technique for text extraction in mixed-type binary documents [O] . Charalambos Strouthopoulos, Athanasios Nikolaidis 2012

机译：混合类型二进制文档中一种可靠的文本提取技术
8. Robust Text Processing in Automated Information Retrieval. [R] . Strzalkowski, T. 1993

机译：自动信息检索中的鲁棒文本处理。

Robust Text Extraction for Automated Processing of Multi-Lingual Personal Identity Documents

摘要

著录项

相似文献

相关主题

期刊订阅