...
首页> 外文期刊>Computer speech and language >Voice biometrics security: Extrapolating false alarm rate via hierarchical Bayesian modeling of speaker verification scores
【24h】

Voice biometrics security: Extrapolating false alarm rate via hierarchical Bayesian modeling of speaker verification scores

机译:语音生物识别技术的安全性:通过说话人验证分数的分级贝叶斯建模推断错误警报率

获取原文
获取原文并翻译 | 示例
           

摘要

How secure automatic speaker verification (ASV) technology is? More concretely, given a specific target speaker, how likely is it to find another person who gets falsely accepted as that target? This question may be addressed empirically by studying naturally confusable pairs of speakers within a large enough corpus. To this end, one might expect to find at least some speaker pairs that are indistinguishable from each other in terms of ASV. To a certain extent, such aim is mirrored in the standardized ASV evaluation benchmarks, for instance, the series of speaker recognition evaluation (SRE) organized by the National Institute of Standards and Technology (NIST). Nonetheless, arguably the number of speakers in such evaluation benchmarks represents only a small fraction of all possible human voices, making it challenging to extrapolate performance beyond a given corpus. Furthermore, the impostors used in performance evaluation are usually selected randomly. A potentially more meaningful definition of an impostor - at least in the context of security-driven ASV applications - would be closest (most confusable) other speaker to a given target. We put forward a novel performance assessment framework to address both the inadequacy of the random-impostor evaluation model and the size limitation of evaluation corpora by addressing ASV security against closest impostors on arbitrarily large datasets. The framework allows one to make a prediction of the safety of given ASV technology, in its current state, for arbitrarily large speaker database size consisting of virtual (sampled) speakers. As a proof-of-concept, we analyze the performance of two state-of-the-art ASV systems, based on i-vector and x-vector speaker embeddings (as implemented in the popular Kaldi toolkit), on the recent VoxCeleb 1, and 2 corpora, containing a total of 7365 speakers. We fix the number of target speakers to 1000, and generate up to N = 100,000 virtual impostors sampled from the generative model. The model-based false alarm rates are in a reasonable agreement with empirical false alarm rates and, as predicted, increase substantially (values up to 98%) with N = 100,000 impostors. Neither the i-vector or x-vector system is immune to increased false alarm rate at increased impostor database size, as predicted by the model.
机译:自动扬声器验证(ASV)技术的安全性如何?更具体地讲,给定特定的目标讲话者,找到另一个被错误接受为目标的人的可能性有多大?通过研究足够大的语料库中自然易混淆的说话人对,可以凭经验解决这个问题。为此,可能期望找到至少一些在ASV方面彼此无法区分的扬声器对。在某种程度上,该目标反映在标准化的ASV评估基准中,例如,由美国国家标准技术研究院(NIST)组织的一系列说话人识别评估(SRE)。尽管如此,可以说,这种评估基准中的发言人人数只代表了所有可能的人类声音的一小部分,因此很难推断出超出给定语料库的表现。此外,绩效评估中使用的冒名顶替者通常是随机选择的。至少在安全性驱动的ASV应用程序上下文中,对冒名顶替者的潜在更有意义的定义将使其他说话者最接近(最容易混淆)给定目标。我们提出了一种新颖的绩效评估框架,通过针对任意大型数据集上针对最接近冒名顶替者的ASV安全性,来解决随机冒名顶替者评估模型的不足和评估语料库的规模限制。该框架允许对由虚拟(采样)扬声器组成的任意大型扬声器数据库大小在其当前状态下的给定ASV技术的安全性进行预测。作为概念验证,我们在最近的VoxCeleb 1上基于i-vector和x-vector扬声器嵌入(在流行的Kaldi工具包中实现)分析了两个最新的ASV系统的性能。以及2个语料库,总共包含7365位演讲者。我们将目标说话者的数量固定为1000,并从生成模型中采样最多生成N = 100,000个虚拟冒名顶替者。基于模型的错误警报率与经验错误警报率在合理的范围内,并且如预测的那样,在N = 100,000冒名顶替者的情况下,错误警报率显着增加(值高达98%)。正如模型所预测的那样,在冒名顶替者数据库规模增加的情况下,i-vector或x-vector系统都无法避免假警报率的增加。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号