Voicer: A Crowd Sourcing Tool for Speech Data Collection

机译：Voicer：用于语音数据收集的人群采购工具

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speech corpora do not exist for most low-resource languages. Thus, creating speech corpora for a language of such a nature is challenging and involves a significant amount of time and effort. This paper provides an overview of related data collection strategies, highlighting a few issues prevalent in the existing approaches. The objectives of this paper encompass firstly the introduction of an open-source tool called “Voicer” that is accessible via both handheld devices and computers that can be used to conduct a speech data collection for a specific domain in a short span of time irrespective of the language. Secondly, it demonstrates the power of the tool, utilizing the same to build a Sinhala speech corpus that consists of 10 hours of speech data for 39 different sentences in the banking domain. Finally, this paper provides a framework to evaluate a speech data corpus along with the lessons learned during the process of data collection with a view to contributing towards future researches.

机译：对于大多数资源匮乏的语言，不存在语音语料库。因此，为这种性质的语言创建语音语料库是具有挑战性的，并且需要大量的时间和精力。本文概述了相关的数据收集策略，重点介绍了现有方法中普遍存在的一些问题。本文的目的首先包括引入一种称为“ Voicer”的开源工具，该工具可通过手持设备和计算机访问，可用于在短时间内针对特定域进行语音数据收集，而无需考虑语言。其次，它展示了该工具的强大功能，利用该工具构建了Sinhala语音语料库，该语料库由银行领域中39个不同句子的10小时语音数据组成。最后，本文提供了一个评估语音数据语料库的框架，以及在数据收集过程中吸取的教训，以便为将来的研究做出贡献。

著录项

来源
《International Conference on Advances in ICT for Emerging Regions》|2018年|174-181|共8页
会议地点
作者
Darshana Buddhika; Ranula Liyadipita; Sudeepa Nadeeshan; Hasini Witharana; Sanath Jayasena; Uthayasanker Thayasivam;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Data collection; Tools; Speech recognition; Internet; Buildings; Mobile handsets;

机译：数据收集;工具;语音识别;互联网;建筑物;手机;

相似文献

外文文献
中文文献
专利

1. ARService: A Smartphone based Crowd-Sourced Data Collection and Activity Recognition Framework [J] . Ozlem Durmaz Incel, Atay Ozgovde Procedia Computer Science . 2018,第22期

机译：ARService：基于智能手机的人群源数据收集和活动识别框架
2. Crowd-Sourced Data Collection for Urban Monitoring via Mobile Sensors [J] . Longo Antonella, Zappatore Marco, Bochicchio Mario, ACM Transactions on Internet Technology . 2018,第1期

机译：通过移动传感器的城市监控人群源数据收集
3. Crowd-funding, tool tests to clear way for space resource collection [J] . Kurt Sacksteder Aerospace America . 2014,第11期

机译：众筹，工具测试，以明确收集空间资源的方式
4. Voicer: A Crowd Sourcing Tool for Speech Data Collection [C] . Darshana Buddhika, Ranula Liyadipita, Sudeepa Nadeeshan, International Conference on Advances in ICT for Emerging Regions . 2018

机译：宣传家：用于语音数据收集的人群采购工具
5. A comparison of selection tool sources for developing collections of books about American Indians: General and specialized tools. [D] . Caldwell, Naomi Rachel. 2002

机译：比较用于开发美洲印第安人书籍集的选择工具来源的比较：通用工具和专用工具。
6. Using Crowd-Sourced Speech Data to Study Socially Constrained Variation in Nonmodal Phonation [O] . Ben Gittelson, Adrian Leemann, Fabian Tomaschek 2020

机译：使用人群源语音数据研究非透明发声的社会受限变化
7. The Advanced Voice Function Assessment Databases (AVFAD): Tools for Voice Clinicians and Speech Research [O] . Jesus, Luís, Belo, Inês, Machado, Jéssica, 2017

机译：高级语音功能评估数据库（AVFAD）：语音临床医生和语音研究工具

Voicer: A Crowd Sourcing Tool for Speech Data Collection

摘要

著录项

相似文献

相关主题

期刊订阅