Scalable Discovery of Audio Fingerprint Motifs in Broadcast Streams With Determinantal Point Process Based Motif Clustering

H. Xu; Z. Ou

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Scalable Discovery of Audio Fingerprint Motifs in Broadcast Streams With Determinantal Point Process Based Motif Clustering

【24h】

Scalable Discovery of Audio Fingerprint Motifs in Broadcast Streams With Determinantal Point Process Based Motif Clustering

机译：基于确定性点过程的母题聚类在广播流中音频指纹母题的可扩展发现

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we study the scalable discovery of audio repetitive patterns/motifs in long broadcast streams, where two segments are said to be repetitive if their audio fingerprints are close to each other. In this task, as we are confined to handle limited variability, we can adapt an audio hashing technique, originally proposed for searching a given music clip in music tracks, to successfully devise a linear complexity similarity matching method with a new step of repeated interval formation. This is the first contribution of this paper. As the similarity matching is super fast and thus coarse, there are false alarms in the large number of pairwise matches generated, which constitute a major source of noise. We propose applying subset selection to the original set of pairwise matches based on determinantal point processes (DPPs), as a filtering step, to reduce the noise. The selected subset of pairwise matches is then subjected to motif clustering. We successfully apply DPP-based subset selection to improve motif clustering, which has a nice property that favors both quality and diversity. This is the second contribution of this paper. The proposed method is thoroughly evaluated on a 9-hour real-world audio stream and is compared with several reference methods. The bootstrap technique is used for the significance test. It is shown that the similarity matching is computationally very efficient (above 100 times faster than real time), and the filtering step with DPPs can significantly improve the precision of motif discovery, without sacrificing the recall performance.

机译：在本文中，我们研究了长广播流中音频重复模式/图案的可扩展发现，其中如果两个段的音频指纹彼此接近，则可以说两个段是重复的。在此任务中，由于我们只能处理有限的可变性，因此我们可以采用最初建议用于在音乐曲目中搜索给定音乐片段的音频哈希技术，从而成功设计出线性复杂度相似度匹配方法，并采用重复间隔形成的新步骤。这是本文的第一篇贡献。由于相似性匹配非常快且因此很粗糙，因此在生成的成对大量匹配中会出现错误警报，这构成了主要的噪声源。我们建议基于确定点过程（DPP）将子集选择应用于原始的成对匹配集，以作为过滤步骤，以减少噪声。然后将选定的成对匹配子集进行主题聚类。我们成功地应用了基于DPP的子集选择来改善图案聚类，该聚类具有良好的特性，既有利于质量又有利于多样性。这是本文的第二个贡献。在9小时的真实音频流上对提出的方法进行了全面评估，并与几种参考方法进行了比较。自举技术用于显着性检验。结果表明，相似度匹配在计算上非常有效（比实时快100倍以上），并且使用DPP进行过滤的步骤可以显着提高基元发现的精度，而不会牺牲召回性能。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2016年第5期|978-989|共12页
作者
H. Xu; Z. Ou;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Audio motif discovery; audio fingerprinting; determinantal point process; motif clustering;

机译：音频基序发现;音频指纹;行列式点过程;基序聚类;

相似文献

外文文献
中文文献
专利

1. Combining phylogenetic motif discovery and motif clustering to predict co-regulated genes [J] . Shane T. Jensen, Lei Shen, Jun S. Liu Bioinformatics . 2005,第20期

机译：结合系统发生的基序发现和基序聚类来预测共同调控的基因
2. Combining phylogenetic motif discovery and motif clustering to predict co-regulated genes [J] . Shane T. Jensen, Lei Shen, Jun S. Liu Bioinformatics . 2005,第20期

机译：结合系统发生基序发现和基序聚类预测共同调控的基因
3. SLiMPrints: conservation-based discovery of functional motif fingerprints in intrinsically disordered protein regions [J] . Davey Norman E., Cowan Joanne L., Shields Denis C., Nucleic Acids Research . 2012,第21期

机译：SLiMPrints：本质上无序的蛋白质区域中基于功能的指纹识别的基于保护的发现
4. Accelerating search tree based brute force motif discovery technique on a processor cluster [C] . Selçuk Aslan, Derviş Karaboğa, Mustafa Doğruer 2016 24th Signal Processing and Communication Application Conference . 2016

机译：在处理器集群上加速基于搜索树的蛮力主题发现技术
5. The ATCUN Motif as a Paramagnetic Beacon in Fragment-based Drug Discovery [D] . Van Reet, Jacob. 2020

机译：ATCUN主题作为基于碎片的药物发现中的顺磁灯泡
6. Motif-Role-Fingerprints: The Building-Blocks of Motifs Clustering-Coefficients and Transitivities in Directed Networks [O] . Mark D. McDonnell, Ömer Nebil Yaveroğlu, Brett A. Schmerl, -1

机译：母题角色指纹：有向网络中母题聚类系数和传递性的构建基块
7. Figure 4: (A) One conserved sequence, which occurs 79 times in 46,264 binding site peaks from the ChIP-seq data-set. The mutation profile of this conserved sequence is illustrated, where ’_ ’ indicates this base is unchanged; DEL indicates this base is lost; INS X indicates a new base X is inserted in front of this base. (B) Several repeated elements patterns are listed. (C) In the first column, the top five DNA motifs, mined by meme-chip tools (Machanick Bailey, 2011) are illustrated. The resemblant conserved sequences, found by the CFSP algorithm are listed in the second column. In the third column, the position-specific scoring matrices, which are transformed from mutational information are listed. The similarity between meme motif and resemblant conserved sequence with PSSM format was calculated via a stamp motif comparison tool (Mahony Benos, 2007). The E-values for the similarity of those pairs is displayed in the fourth column. (D) One motif is selected in each group clustered by gkmsvm descriptors, and the corresponding motif found by the CFSP algorithm is listed below. (E) There are additional datasets (File No: ENCFF100GRL, ENCFF616IRT, ENCFF870CER, Target: SREBF1) collected from https://www.encodeproject.org. The top two motifs are selected in each file using meme tools, and the corresponding motifs found by our algorithm are listed below. [O] . -1

机译：图4：（a）一种保守序列，其发生在芯片-SEQ数据集中的46,264个结合位点峰值中的79倍。说明了这种保守序列的突变分布，其中'_'表示该碱度不变; del表示此基础丢失; INS X表示新的基础X插入此基础前面。（b）列出了几种重复的元素模式。（c）在第一栏中，示出了由MEME芯片工具（Machanick＆Bailey，2011）开采的前五个DNA主题。由CFSP算法发现的相应保守序列列于第二列中。在第三列中，列出了从突变信息转换的特定位置的评分矩阵。 MEME主题与PSSM格式的相似性与PSSM格式之间的相似性通过邮票图章比较工具（Mahony＆Benos，2007）计算。这些对相似性的电子值显示在第四列中。（d）在由GKMSVM描述符聚集的每个组中选择了一个图案，下面列出了CFSP算法的相应主题。（e）从https://www.encodeproject.org收集的，有附加数据集（文件no：cernff100grl，cenf616irl，conf8.20cer，target：srebf1）。使用MEME工具在每个文件中选择前两个图案，并且我们的算法发现的相应主题如下所示。
8. Metal Occupancy of Zinc Finger Motifs as Determinants for Zn2+-Mediated Chemosensitization of Prostate Cancer Cells. [R] . W. H. Gmeiner 2013

机译：锌指基序的金属占据作为Zn2 +介导的前列腺癌细胞化学增敏的决定因素。

Scalable Discovery of Audio Fingerprint Motifs in Broadcast Streams With Determinantal Point Process Based Motif Clustering

摘要

著录项

相似文献

相关主题

期刊订阅