Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition

机译：利用深度神经网络提高属性和电话估计的准确性，从而实现基于检测的语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Generation of high-precision sub-phonetic attribute (also known as phonological features) and phone lattices is a key frontend component for detection-based bottom-up speech recognition. In this paper we employ deep neural networks (DNNs) to improve detection accuracy over conventional shallow MLPs (multi-layer perceptrons) with one hidden layer. A range of DNN architectures with five to seven hidden layers and up to 2048 hidden units per layer have been explored. Training on the SI84 and testing on the Nov92 WSJ data, the proposed DNNs achieve significant improvements over the shallow MLPs, producing greater than 90% frame-level attribute estimation accuracies for all 21 attributes tested for the full system. On the phone detection task, we also obtain excellent frame-level accuracy of 86.6%. With this level of high-precision detection of basic speech units we have opened the door to a new family of flexible speech recognition system design for both top-down and bottom-up, lattice-based search strategies and knowledge integration.

机译：高精度子语音属性（也称为语音特征）和电话格的生成是基于检测的自底向上语音识别的关键前端组件。在本文中，我们使用深度神经网络（DNN）来提高具有一个隐藏层的常规浅层MLP（多层感知器）的检测精度。已探索了一系列DNN架构，这些架构具有五到七个隐藏层，每层最多2048个隐藏单元。拟议的DNN在SI84上进行了训练并在Nov92 WSJ数据上进行了测试，与浅层MLP相比，取得了显着改善，对整个系统测试的所有21个属性产生了超过90％的帧级属性估计准确性。在电话检测任务上，我们还获得了86.6％的出色帧级精度。通过对基本语音单元的高精度检测，我们为基于自上而下和自下而上，基于格的搜索策略以及知识集成的灵活语音识别系统设计新家族打开了大门。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing;ICASSP》|2012年|p.4169- 4172|共4页
会议地点 Kyoto(JP)
作者
Yu, Dong;
展开▼
作者单位

Speech Research Group Microsoft Research Redmond WA. USA;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature [J] . Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara EURASIP journal on advances in signal processing . 2015,第1期

机译：结合了深度神经网络和深度自动编码器的混响语音识别，并增强了电话类功能
2. Exploiting deep neural rietworks for detection-based speech recognition [J] . Sabato Marco Siniscalchi, Dong Yu, Li Deng, Neurocomputing . 2013,第apra15期

机译：利用深度神经探究技术进行基于检测的语音识别
3. Acoustic landmarks contain more information about the phone string than other frames for automatic speech recognition with deep neural network acoustic model [J] . He Di, Lim Boon Pang, Yang Xuesong, The Journal of the Acoustical Society of America . 2018,第6aPta1期

机译：声学地标包含与具有深度神经网络声学模型的自动语音识别的其他帧的更多信息
4. Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition [C] . Dong Yu, Siniscalchi S.M., Li Deng, IEEE International Conference on Acoustics, Speech and Signal Processing . 2011

机译：具有深度神经网络的促进属性和电话估计精度，用于基于检测的语音识别
5. A Face Detection and Recognition System for Color Images Using Neural Networks with Boosting and Deep Learning [D] . Hajiarbabi, Mohammadreza. 2017

机译：基于神经网络的Boosting和深度学习彩色图像人脸检测与识别系统
6. Multi-resolution speech analysis for automatic speech recognition using deep neural networks: Experiments on TIMIT [O] . Doroteo T. Toledano, María Pilar Fernández-Gallego, Alicia Lozano-Diez 2012

机译：基于深度神经网络的自动语音识别的多分辨率语音分析：TIMIT实验
7. Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition [O] . Dong Yu, Sabato Marco Siniscalchi, Li Deng, 2012

机译：利用深度神经网络提高属性和电话估计精度，实现基于检测的语音识别

Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅