首页> 外文学位 >A multimodal fusion approach for automatic postal address recognition system using Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) techniques.
【24h】

A multimodal fusion approach for automatic postal address recognition system using Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) techniques.

机译:一种使用光学字符识别(OCR)和自动语音识别(ASR)技术的自动邮政地址识别系统的多模式融合方法。

获取原文
获取原文并翻译 | 示例

摘要

Multimodal system design has gained popularity over the recent years due to the possibility of combining different input modalities for improved system performance. This thesis presents a multimodal approach for automatic post sorting that combines Automatic Speech Recognition (ASR) and Optical Character Recognition (OCR). For the purpose of fusion, we propose a new confidence measure for OCR output. Based on the OCR confidence measure, a dynamic fusion strategy is developed that forms its final decision on (i) OCR output alone, (ii) ASR output alone, and (iii) combination of ASR and OCR outputs. In particular, the combination of ASR and OCR outputs is performed in two steps. First, the ASR output is used to automatically generate a list of alternate address candidates. Next, this list of alternate addresses is processed by the OCR to determine the most likely address. The proposed system is evaluated on speech data derived from the UT-Accent corpus, and images obtained from the Siemens Mobility-Postal Automation database. Both speech and image data were collected in realistic settings, and contains large real-world variability. Our experiments show that the proposed multimodal solutions achieves an overall zip code recognition rate of 88.9%, which is a substantial improvement over ASR alone (79%) and OCR alone (80%). This advancement represents an important contribution that leverages technologies to improve the overall address recognition in package address recognition.
机译:近年来,由于可以组合不同的输入方式来改善系统性能,因此多模式系统设计已变得越来越流行。本文提出了一种结合自动语音识别(ASR)和光学字符识别(OCR)的自动帖子分类的多模式方法。出于融合的目的,我们提出了一种新的OCR输出置信度度量。基于OCR置信度度量,开发了一种动态融合策略,该策略对(i)仅OCR输出,(ii)仅ASR输出以及(iii)ASR和OCR输出的组合形成最终决定。特别是,ASR和OCR输出的组合分两个步骤执行。首先,ASR输出用于自动生成备用地址候选列表。接下来,OCR处理此备用地址列表,以确定最可能的地址。拟议的系统将根据UT-Accent语料库的语音数据以及从Siemens Mobility-Postal Automation数据库获得的图像进行评估。语音和图像数据都是在逼真的环境中收集的,并且包含很大的真实性差异。我们的实验表明,所提出的多模式解决方案实现了88.9%的总体邮政编码识别率,与仅ASR(79%)和OCR(80%)相比有了实质性的改进。这一进步代表了一项重要贡献,它利用技术来改善包裹地址识别中的整体地址识别。

著录项

  • 作者

    Singh, Amriteshwar.;

  • 作者单位

    The University of Texas at Dallas.;

  • 授予单位 The University of Texas at Dallas.;
  • 学科 Computer Science.
  • 学位 M.S.C.S.
  • 年度 2011
  • 页码 69 p.
  • 总页数 69
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 康复医学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号