Selection of acoustic features for robust speech recognition has been the subject of research for several years. In the past, algorithms that use feature vectors from multiple frequency bands [9], or employ techniques to switch between multiple feature streams [10] have been reported in the literature to handle robustness under different acoustic conditions. Acoustic models built out of differnet feature sets produce different kinds of recognition errors. In this paper, we propose a likelihood-based scheme to combine the acoustic feature vectors from multiple signal processing schemes within the decoding framework, in order to extract maximum benefit from these different acoustic feature vectors from multiple signal processing schemes within the decoding framework, in order to extract maximum benefit from these differnet acoustic feature vectors and models. The proposed technique is general enough to be applied to other pattern recognition fields, such as, OCR, handwriting recognition, etc. The fundamental idea behind this approach is to pick the set of features that classifies a frame of speech accurately with no apriori information about the phonetic class or acoustic channel that this speech comes from. Two methods of merging any set of acosutic features, such as, formant-based features, cepstral feature vectors, PLP features, LDA features etc.
展开▼