On-Device End-to-end Speech Recognition with Multi-Step Parallel Rnns

机译：具有多步并行Rnns的设备端到端语音识别

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most of the current automatic speech recognition is performed on a remote server. However, the demand for speech recognition on personal devices is increasing, owing to the requirement of shorter recognition latency and increased privacy. End-to-end speech recognition that employs recurrent neural networks (RNNs) shows good accuracy, but the execution of conventional RNNs, such as the long short-term memory (LSTM) or gated recurrent unit (GRU), demands many memory accesses, thus hindering its real-time execution on smart-phones or embedded systems. To solve this problem, we built an end-to-end acoustic model (AM) using linear recurrent units instead of LSTM or GRU and employed a multi-step parallel approach for reducing the number of DRAM accesses. The AM is trained with the connectionist temporal classification (CTC) loss, and the decoding is conducted using weighted finite-state transducers (WFSTs). The proposed system achieves x4.8 real-time speed when executed on a single core of an ARM CPU-based system.

机译：当前的大多数自动语音识别都是在远程服务器上执行的。然而，由于要求较短的识别等待时间和增加的私密性，对个人设备上的语音识别的需求正在增加。采用递归神经网络（RNN）的端到端语音识别显示出较高的准确性，但是常规RNN的执行（例如长短期记忆（LSTM）或门控递归单元（GRU））需要进行多次内存访问，因此阻碍了它在智能手机或嵌入式系统上的实时执行。为了解决此问题，我们使用线性递归单元（而非LSTM或GRU）构建了端到端声学模型（AM），并采用了多步并行方法来减少DRAM访问次数。对AM进行连接性时间分类（CTC）损失训练，并使用加权有限状态换能器（WFST）进行解码。当在基于ARM CPU的系统的单核上执行时，建议的系统可实现x4.8实时速度。

著录项

来源
《2018 IEEE Spoken Language Technology Workshop》|2018年|376-381|共6页
会议地点 Athens(GR)
作者
Yoonho Boo; Jinhwan Park; Lukas Lee; Wonyong Sung;
展开▼
作者单位

Department of Electrical and Computer Engineering, Seoul National University, Seoul, 08826, Korea;

Department of Electrical and Computer Engineering, Seoul National University, Seoul, 08826, Korea;

Department of Electrical and Computer Engineering, Seoul National University, Seoul, 08826, Korea;

Department of Electrical and Computer Engineering, Seoul National University, Seoul, 08826, Korea;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Speech recognition; Decoding; Random access memory; Cats; Training; Acoustics; Computational modeling;

机译：语音识别;解码;随机存取存储器;猫;训练;声学;计算模型;;

相似文献

外文文献
中文文献
专利

1. Bridging automatic speech recognition and psycholinguistics: Extending Shortlist to an end-to-end model of human speech recognition (L) [J] . Odette Scharenborg, Louis ten Bosch, Lou Boves, The Journal of the Acoustical Society of America . 2003,第6期

机译：桥接自动语音识别和心理语言学：将候选清单扩展到人类语音识别的端到端模型（L）
2. Using Highway Connections to Enable Deep Small-footprint LSTM-RNNs for Speech Recognition [J] . Cheng Gaofeng, Li Xin, Yan Yonghong Chinese Journal of Electronics . 2019,第1期

机译：使用公路连接启用深度较小的LSTM-RNN进行语音识别
3. Using Highway Connections to Enable Deep Small-footprint LSTM-RNNs for Speech Recognition? [J] . CHENG Gaofeng, LI Xin, YAN Yonghong 中国电子杂志（英文版） . 2019,第001期

机译：使用高速公路连接启用深度较小的LSTM-RNN进行语音识别？
4. On-Device End-to-end Speech Recognition with Multi-Step Parallel Rnns [C] . Yoonho Boo, Jinhwan Park, Lukas Lee, Spoken Language Technology Workshop . 2018

机译：具有多步行RNN的设备端到端语音识别
5. End-to-End Speech Recognition on Conversations [D] . Kim, Suyoun . 2019

机译：对话的端到端语音识别
6. Dynamic Acoustic Unit Augmentation with BPE-Dropout for Low-Resource End-to-End Speech Recognition [O] . Aleksandr Laptev, Andrei Andrusenko, Ivan Podluzhny, 2021

机译：用BPE-ropout进行动态声学单元增强用于低资源端到端语音识别
7. A Review of On-Device Fully Neural End-to-End Automatic Speech Recognition Algorithms [O] . Chanwoo Kim, Dhananjaya Gowda, Dongsoo Lee, 2020

机译：对设备的综述完全神经元端到端自动语音识别算法

On-Device End-to-end Speech Recognition with Multi-Step Parallel Rnns

摘要

著录项

相似文献

相关主题

期刊订阅