...
首页> 外文期刊>Pattern recognition letters >Classification of general audio data for content-based retrieval
【24h】

Classification of general audio data for content-based retrieval

机译:用于基于内容的检索的常规音频数据的分类

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we address the problem of classification of continuous general audio data (GAD) for content-based retrieval, and describe a scheme that is able to classify audio segments into seven categories consisting of silence, single speaker speech, music, environmental noise, multiple speakers' speech, simultaneous speech and music, and speech and noise. We studied a total of 143 classification features for their discrimination capability. Our study shows that cepstral- based features such as the Mel-frequency cepstral coefficients (MFCC) and linear prediction coefficients (LPC) provide better classification accuracy compared to temporal and spectral features. To minimize the classification errors near the boundaries of audio segments of different type in general audio data, a segmentation pooling scheme is also proposed in this work. This scheme yields classification results that are consistent with human perception. Our classification system provides over 90/100 accuracy at a processing speed dozens of times faster than the playing rate.
机译:在本文中,我们解决了用于基于内容的检索的连续通用音频数据(GAD)的分类问题,并描述了一种能够将音频片段分为七个类别的方案,该类别包括静音,单扬声器语音,音乐,环境噪声,多位演讲者的语音,同步语音和音乐以及语音和噪音。我们共研究了143个分类特征的区分能力。我们的研究表明,与时间和频谱特征相比,基于倒谱的特征(例如梅尔频率倒谱系数(MFCC)和线性预测系数(LPC))提供了更好的分类精度。为了使通用音频数据中不同类型音频片段边界附近的分类错误最小化,在这项工作中还提出了一种分段合并方案。该方案产生与人类感知一致的分类结果。我们的分类系统提供90/100以上的准确性,处理速度比播放速度快数十倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号