We previously proposed an incremental speaker adaptation method combined with automatic speaker-change detection for broadcast news transcription where speakers change frequently and each of them utters a series of several sentences. In this method, the speaker change is detected using speaker-independent and speaker-adaptive Gaussian mixture models (GMMs). Both phone HMMs and GMMs are incrementally adapted to each speaker by the combination of MLLR, MAP and VFS methods using speaker by the combination of MLLR, MAP and VFS methods using speaker-independent (SI) models as initial models. This paper proposes its improvement in which an initial model for speaker adaptation is selected from a set of models made by speaker clustering. Either cluster-dependent phone HMMs or GMMs are used to calculate the likelihood for selecting the best initial model. In a broadcast news transcription task, the proposed method significantly reduces word error rate compared with the method using SI-HMM as an initial model. Online incremental speaker adaptation results show that word errr rate is reduced by 11.6
展开▼