Convolutional recurrent neural network with attention for Vietnamese speech to text problem in the operating room

Trinh Tan Dat; Le Tran Anh Dang; Vu Ngoc Thanh Sang; Le Nhi Lam Thuy; Pham The Bao

首页> 外文期刊>International journal of intelligent information and database systems >Convolutional recurrent neural network with attention for Vietnamese speech to text problem in the operating room

【24h】

Convolutional recurrent neural network with attention for Vietnamese speech to text problem in the operating room

机译：卷积经常性神经网络，注意越南语演讲到手术室的文本问题

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We introduce automatic Vietnamese speech recognition (ASR) system for converting Vietnamese speech to text on a real operating room ambient noise recorded during liver surgery. First, we propose applying a combination between convolutional neural network (CNN) and bidirectional long short-term memory (BLSTM) for investigating local speech feature learning, sequence modelling, and transcription for speech recognition. We also extend the CNN-LSTM framework with an attention mechanism to decode the frames into a sequence of words. The CNN, LSTM and attention models are combining into a unified architecture. In addition, we combine connectionist temporal classification (CTC) and attention's loss functions in training phase. The length of the output label sequence from CTC is applied to the attention-based decoder predictions to make the final label sequence. This process helps to decrease irregular alignments and make speedup of the label sequence estimation during training and inference, instead of only relying on the data-driven attention-based encoder-decoder for estimating the label sequence in long sentences. The proposed system is evaluated using a real operating room database. The results show that our method significantly enhances the performance of the ASR system. We find that our approach provides a 13.05% in WER and outperforms standard methods.

机译：我们介绍了用于将越南语演讲转换为肝脏手术期间记录的实际操作室环境噪声的文本的自动越南语音识别（ASR）系统。首先，我们建议在卷积神经网络（CNN）和双向长期短期记忆（BLSTM）之间应用用于研究局部语音特征学习，序列建模和语音识别的转录。我们还通过注意机制扩展CNN-LSTM框架来将帧解码为一系列单词。 CNN，LSTM和注意模型正在结合到统一的架构中。此外，我们在训练阶段结合了连接人员时间分类（CTC）和注意力的损失功能。从CTC的输出标签序列的长度应用于基于注意的解码器预测以使最终标签序列进行。该过程有助于降低不规则的对准并在训练和推理期间加速标签序列估计，而不是仅依赖于数据驱动的注意力的编码器解码器，用于估计长句的标签序列。使用真正的手术室数据库进行评估所提出的系统。结果表明，我们的方法显着提高了ASR系统的性能。我们发现我们的方法在WER和优于标准方法中提供13.05％。

著录项

来源
《International journal of intelligent information and database systems》 |2021年第3期|294-314|共21页
作者
Trinh Tan Dat; Le Tran Anh Dang; Vu Ngoc Thanh Sang; Le Nhi Lam Thuy; Pham The Bao;
展开▼
作者单位

Information Science Faculty Sai Gon University;

Faculty of Electrical and Electronics Engineering University of Technology;

Information Science Faculty Sai Gon University;

Information Science Faculty Sai Gon University;

Information Science Faculty Sai Gon University;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Vietnamese speech recognition; convolutional neural network; CNN; bidirectional long short-term memory; BLSTM; attention; operating room;

机译：越南语音识别;卷积神经网络;CNN;双向长期短期记忆;BLSTM;关注;手术室;

相似文献

外文文献
中文文献
专利

1. 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition [J] . Mingyi Chen, Xuanji He, Jing Yang, IEEE signal processing letters . 2018,第10期

机译：具有注意力模型的3-D卷积递归神经网络用于语音情感识别
2. Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification [J] . Banerjee Imon, Ling Yuan, Chen Matthew C., Artificial intelligence in medicine . 2019,第JUNa期

机译：卷积神经网络（CNN）和递归神经网络（RNN）架构在放射学文本报告分类中的比较有效性
3. Combining attention-based bidirectional gated recurrent neural network and two-dimensional convolutional neural network for document-level sentiment classification [J] . Liu Fagui, Zheng Jingzhong, Zheng Lailei, Neurocomputing . 2020,第Jana2期

机译：结合基于注意力的双向门控递归神经网络和二维卷积神经网络进行文档级情感分类
4. Relation Extraction in Vietnamese Text via Piecewise Convolution Neural Network with Word-Level Attention [C] . Van-Nhat Nguyen, Ha-Thanh Nguyen, Dinh-Hieu Vo, NAFOSTED Conference on Information and Computer Science . 2018

机译：通过词级注意的分段卷积神经网络提取越南文本中的关系
5. Deep Neural Language Model for Text Classification Based on Convolutional and Recurrent Neural Networks [D] . Hassan, Abdalraouf. 2018

机译：基于卷积神经网络和递归神经网络的深度神经语言文本分类模型
6. Single-modal and multi-modal false arrhythmia alarm reduction using attention-based convolutional and recurrent neural networks [O] . Sajad Mousavi, Atiyeh Fotoohinasab, Fatemeh Afghah 2020

机译：使用基于注意力的卷积和经常性神经网络的单模和多模态假心律失常报警
7. A Hybrid Bidirectional Recurrent Convolutional Neural Network Attention-Based Model for Text Classification [O] . Jin Zheng, Limin Zheng 2019

机译：文本分类的基于混合双向反复卷积神经网络关注模型

Convolutional recurrent neural network with attention for Vietnamese speech to text problem in the operating room

摘要

著录项

相似文献

相关主题

期刊订阅