首页> 外文会议>International Conference on Advances in ICT for Emerging Regions >Voicer: A Crowd Sourcing Tool for Speech Data Collection
【24h】

Voicer: A Crowd Sourcing Tool for Speech Data Collection

机译:Voicer:用于语音数据收集的人群采购工具

获取原文

摘要

Speech corpora do not exist for most low-resource languages. Thus, creating speech corpora for a language of such a nature is challenging and involves a significant amount of time and effort. This paper provides an overview of related data collection strategies, highlighting a few issues prevalent in the existing approaches. The objectives of this paper encompass firstly the introduction of an open-source tool called “Voicer” that is accessible via both handheld devices and computers that can be used to conduct a speech data collection for a specific domain in a short span of time irrespective of the language. Secondly, it demonstrates the power of the tool, utilizing the same to build a Sinhala speech corpus that consists of 10 hours of speech data for 39 different sentences in the banking domain. Finally, this paper provides a framework to evaluate a speech data corpus along with the lessons learned during the process of data collection with a view to contributing towards future researches.
机译:对于大多数资源匮乏的语言,不存在语音语料库。因此,为这种性质的语言创建语音语料库是具有挑战性的,并且需要大量的时间和精力。本文概述了相关的数据收集策略,重点介绍了现有方法中普遍存在的一些问题。本文的目的首先包括引入一种称为“ Voicer”的开源工具,该工具可通过手持设备和计算机访问,可用于在短时间内针对特定域进行语音数据收集,而无需考虑语言。其次,它展示了该工具的强大功能,利用该工具构建了Sinhala语音语料库,该语料库由银行领域中39个不同句子的10小时语音数据组成。最后,本文提供了一个评估语音数据语料库的框架,以及在数据收集过程中吸取的教训,以便为将来的研究做出贡献。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号