In this paper, based on the idea that humans are capable of detecting human emotions during a conversation through speech and facial expression input, an emotion recognition system that can detect the emotion from acoustic features, semantic contents, and facial expression during conversation is proposed. In the analysis of speech signals, thirty-three acoustic features are extracted from the speech input After Principle Component Analysis (PCA), 14 principle components are selected for discriminative representation. In this representation each principle component is the combination of the 33 original acoustic features and forms a feature subspace. The Support Vector Machines (SVMs) are adopted to classify the emotional states. In facial emotion recognition module, the facial image captured from CCD is provided for facial image feature extraction. An SVM model is applied for emotion recognition. Finally in text analysis, all emotional keywords and emotion modification words are manually defined. The emotion intensity levels of emotional keywords and emotion modification words are estimated from a collected emotion corpus. The final emotional state is determined based on the emotion outputs from these three modules. The experimental result shows that the emotion recognition accuracy of the integrated system is better than each of the three individual approaches.
展开▼