首页> 外文会议>2018 IEEE Spoken Language Technology Workshop >On-Device End-to-end Speech Recognition with Multi-Step Parallel Rnns
【24h】

On-Device End-to-end Speech Recognition with Multi-Step Parallel Rnns

机译:具有多步并行Rnns的设备端到端语音识别

获取原文
获取原文并翻译 | 示例

摘要

Most of the current automatic speech recognition is performed on a remote server. However, the demand for speech recognition on personal devices is increasing, owing to the requirement of shorter recognition latency and increased privacy. End-to-end speech recognition that employs recurrent neural networks (RNNs) shows good accuracy, but the execution of conventional RNNs, such as the long short-term memory (LSTM) or gated recurrent unit (GRU), demands many memory accesses, thus hindering its real-time execution on smart-phones or embedded systems. To solve this problem, we built an end-to-end acoustic model (AM) using linear recurrent units instead of LSTM or GRU and employed a multi-step parallel approach for reducing the number of DRAM accesses. The AM is trained with the connectionist temporal classification (CTC) loss, and the decoding is conducted using weighted finite-state transducers (WFSTs). The proposed system achieves x4.8 real-time speed when executed on a single core of an ARM CPU-based system.
机译:当前的大多数自动语音识别都是在远程服务器上执行的。然而,由于要求较短的识别等待时间和增加的私密性,对个人设备上的语音识别的需求正在增加。采用递归神经网络(RNN)的端到端语音识别显示出较高的准确性,但是常规RNN的执行(例如长短期记忆(LSTM)或门控递归单元(GRU))需要进行多次内存访问,因此阻碍了它在智能手机或嵌入式系统上的实时执行。为了解决此问题,我们使用线性递归单元(而非LSTM或GRU)构建了端到端声学模型(AM),并采用了多步并行方法来减少DRAM访问次数。对AM进行连接性时间分类(CTC)损失训练,并使用加权有限状态换能器(WFST)进行解码。当在基于ARM CPU的系统的单核上执行时,建议的系统可实现x4.8实时速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号