We present a novel technique for separation of doctor and patient's speech i nconversations over a telemedicine network. The mixed speech signals acquired at doctor's site is first broken into single talkers' speech segments and background by uisng thresholds of energy and duration. The speech segments are then identified as spoken by doctor or patient in two steps. In the first step, Gaussian mixture models (GMM) of doctor and patient are used, where the docotor's model is obtained fro mhis/her training speech, and the patient's model is initialized by a general speaker model and hten adapted by the patient's speech. In the second step, a decision tree that uses contextual and confidence features is applied to refine the identification results. Preliminary experiemnts were performed on three data sets collected in telemedicine. Without adaptation and decision tree, error rates at the segment-level and frame-level were 25.44
展开▼