Speech summarization, facilitating users to better browse through and understand speech information (especially, spoken documents), has become an active area of intensive research recently. Many of the existing machine-learning approaches to speech summarization cast important sentence selection as a two-class classification problem and have shown empirical success for a wide array of summarization tasks. One common deficiency of these approaches is that the corresponding learning criteria are loosely related to the final evaluation metric. To cater for this problem, we present a novel probabilistic framework to learn the summarization models, building on top of the Bayes decision theory. Two effective training criteria, viz. maximum relevance estimation (MRE) and minimum ranking loss estimation (MRLE), deduced from such a framework are introduced to characterize the pair-wise preference relationships between spoken sentences. Experiments on a broadcast news speech summarization task exhibit the performance merits of our summarization methods when compared to existing methods.
展开▼