首页> 外文学位 >Speech coding and transmission for improved automatic recognition in communication networks.
【24h】

Speech coding and transmission for improved automatic recognition in communication networks.

机译:语音编码和传输,用于改进通信网络中的自动识别。

获取原文
获取原文并翻译 | 示例

摘要

This thesis consists of three parts. We first evaluate automatic speech recognition (ASR) results at various locations of a voice communication system, and determine causes for performance deterioration compared to traditional ASR where the original speech signal is available. We then proceed to alleviate two feature distortions responsible for the performance degradation. The first comes from the fact that all model-based speech coding standards use linear prediction (LP) parameters to represent spectral information, whereas the Mel-frequency cepstral coefficients (MFCCs) are generally more robust for ASR purpose. The second distortion is due to complete loss of information packets. Since such loss tends to happen in bursts due to fading channels in wireless communication or network congestion in voice over IP, most frame-concealment techniques employed by the decoder, including insertion, interpolation, or substitution, are futile in trying to smooth out the gap. The ultimate goal is to make appropriate modification at the transmitter/encoder side, and provide the most suitable and reliable feature sets to restore the performance of a given ASR algorithm at the receiver/client side to near the level when original data is accessible. Since it is difficult to imagine any voice service excluding subjective listening, the modifications should lead to minimal deterioration in perceptual quality.
机译:本文共分三个部分。我们首先评估语音通信系统各个位置的自动语音识别(ASR)结果,并确定与原始语音信号可用的传统ASR相比性能下降的原因。然后,我们开始减轻造成性能下降的两个特征失真。首先来自以下事实:所有基于模型的语音编码标准都使用线性预测(LP)参数来表示频谱信息,而梅尔频率倒谱系数(MFCC)通常对于ASR而言更为健壮。第二种失真是由于信息包的完全丢失。由于无线通信中的信道衰落或IP语音中的网络拥塞会导致突发丢失,因此解码器采用的大多数帧隐藏技术(包括插入,内插或替换)在消除间隙方面都是徒劳的。最终目标是在发送器/编码器端进行适当的修改,并提供最合适和最可靠的功能集,以在可访问原始数据时将接收器/客户端的给定ASR算法的性能恢复到接近水平。由于很难想象除了主观收听之外的任何语音服务,因此修改应将感知质量的下降降到最低。

著录项

  • 作者

    Zhong, Xin.;

  • 作者单位

    Georgia Institute of Technology.;

  • 授予单位 Georgia Institute of Technology.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2003
  • 页码 101 p.
  • 总页数 101
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号