首页>
外国专利>
VOICE ACTIVITY DETECTION METHOD BASED ON STATISTICAL MODEL EMPLOYING DEEP NEURAL NETWORK AND VOICE ACTIVITY DETECTION DEVICE PERFORMING THE SAME
VOICE ACTIVITY DETECTION METHOD BASED ON STATISTICAL MODEL EMPLOYING DEEP NEURAL NETWORK AND VOICE ACTIVITY DETECTION DEVICE PERFORMING THE SAME
展开▼
机译:基于统计模型的深层神经网络语音检测方法及性能相同的语音检测装置
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present invention relates to a statistical model-based speech detection method using an enhanced neural network, and a speech detection apparatus for performing the same. More specifically, the speech detection apparatus includes: (1) (A priori) signal-to-noise ratio (SNR), a posteriori signal-to-noise ratio (SNR), and a likelihood ratio (LR) based on the variance value of the input speech signal. Extracting a vector; (2) in the learning step, using the result of the extracted feature vector, initializing a weight and a bias of a deepening neural network having a plurality of nonlinear hidden layers to pre-learn the deepening neural network; (3) optimizing the deepening neural network based on a slope descent based inversion algorithm, using the result of the extracted feature vector and the labeling value for presence / absence of speech in the learning step; And (4) classifying the input speech signal into a speech section or a noise section based on a determination function using a result obtained through the learned deepening neural network from a feature vector obtained through the feature vector extraction method in a classification step The present invention is not limited to these embodiments. According to the statistical model-based speech detection method and the speech detection apparatus for performing the same, which are proposed in the present invention, the speech signal contaminated by the ambient noise is input in the learning step, and the variance value Extracts a feature vector using a priori signal-to-noise ratio (SNR), a posteriori signal-to-noise ratio (SNR), and likelihood ratio (LR) Using the result, initialization of weights and weights of the deepening neural network having a plurality of nonlinear hidden layers to pre-learn the deepening neural network, and using the result of the extracted feature vector and the labeling value of existence / absence of voice, Based on an inverse gradient-based inverse algorithm, optimizes the deepening neural network, and in a classification step, Classifying the input speech signal into a speech interval or a noise interval based on a determination function using the result obtained through the learned deepening neural network from the feature vector, It is possible to more effectively model the distribution of the likelihood ratio with respect to the case of not doing so, improve the voice detection performance, and reduce the calculation time.
展开▼