...
首页> 外文期刊>International journal of intelligent information and database systems >Convolutional recurrent neural network with attention for Vietnamese speech to text problem in the operating room
【24h】

Convolutional recurrent neural network with attention for Vietnamese speech to text problem in the operating room

机译:卷积经常性神经网络,注意越南语演讲到手术室的文本问题

获取原文
获取原文并翻译 | 示例
           

摘要

We introduce automatic Vietnamese speech recognition (ASR) system for converting Vietnamese speech to text on a real operating room ambient noise recorded during liver surgery. First, we propose applying a combination between convolutional neural network (CNN) and bidirectional long short-term memory (BLSTM) for investigating local speech feature learning, sequence modelling, and transcription for speech recognition. We also extend the CNN-LSTM framework with an attention mechanism to decode the frames into a sequence of words. The CNN, LSTM and attention models are combining into a unified architecture. In addition, we combine connectionist temporal classification (CTC) and attention's loss functions in training phase. The length of the output label sequence from CTC is applied to the attention-based decoder predictions to make the final label sequence. This process helps to decrease irregular alignments and make speedup of the label sequence estimation during training and inference, instead of only relying on the data-driven attention-based encoder-decoder for estimating the label sequence in long sentences. The proposed system is evaluated using a real operating room database. The results show that our method significantly enhances the performance of the ASR system. We find that our approach provides a 13.05% in WER and outperforms standard methods.
机译:我们介绍了用于将越南语演讲转换为肝脏手术期间记录的实际操作室环境噪声的文本的自动越南语音识别(ASR)系统。首先,我们建议在卷积神经网络(CNN)和双向长期短期记忆(BLSTM)之间应用用于研究局部语音特征学习,序列建模和语音识别的转录。我们还通过注意机制扩展CNN-LSTM框架来将帧解码为一系列单词。 CNN,LSTM和注意模型正在结合到统一的架构中。此外,我们在训练阶段结合了连接人员时间分类(CTC)和注意力的损失功能。从CTC的输出标签序列的长度应用于基于注意的解码器预测以使最终标签序列进行。该过程有助于降低不规则的对准并在训练和推理期间加速标签序列估计,而不是仅依赖于数据驱动的注意力的编码器解码器,用于估计长句的标签序列。使用真正的手术室数据库进行评估所提出的系统。结果表明,我们的方法显着提高了ASR系统的性能。我们发现我们的方法在WER和优于标准方法中提供13.05%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号