首页> 外国专利> ACTIVE LEARNING FOR LARGE-SCALE SEMI-SUPERVISED CREATION OF SPEECH RECOGNITION TRAINING CORPORA

ACTIVE LEARNING FOR LARGE-SCALE SEMI-SUPERVISED CREATION OF SPEECH RECOGNITION TRAINING CORPORA

机译:积极学习大型半监督的语音识别培训公司的创建

摘要

Techniques are disclosed for generating ASR training data. According to an embodiment, impactful ASR training corpora is generated efficiently, and the quality or relevance of ASR training corpora being generated is increased by leveraging knowledge of the ASR system being trained. An example methodology includes: selecting one of a word or phrase, based on knowledge and/or content of said ASR training corpora; presenting a textual representation of said word or phrase; receiving a speech utterance that includes said word or phrase; receiving a transcript for said speech utterance; presenting said transcript for review (to allow for editing, if needed); and storing said transcript and said audio file in an ASR system training database. The selecting may include, for instance, selecting a word or phrase that is under-represented in said database, and/or based upon an n-gram distribution on a language, and/or based upon known areas that tend to incur transcription mistakes.
机译:公开了用于生成ASR训练数据的技术。根据一个实施例,有效地产生有影响力的ASR训练语料,并且通过利用被训练的ASR系统的知识来增加所产生的ASR训练语料的质量或相关性。示例方法论包括:基于所述ASR训练语料库的知识和/或内容,选择单词或短语中的一个;呈现所述单词或短语的文字表示;接收包括所述单词或短语的语音;接收所述言语的笔录;提交成绩单以供审查(必要时允许编辑);将所述成绩单和所述音频文件存储在ASR系统培训数据库中。选择可以包括,例如,选择在所述数据库中未充分表示的单词或短语,和/或基于在语言上的n元语法分布,和/或基于倾向于引起转录错误的已知区域。

著录项

  • 公开/公告号US2020152175A1

    专利类型

  • 公开/公告日2020-05-14

    原文格式PDF

  • 申请/专利权人 ADOBE INC.;

    申请/专利号US201816189238

  • 发明设计人 FRANCK DERNONCOURT;

    申请日2018-11-13

  • 分类号G10L15/07;G10L15/197;G10L15/26;

  • 国家 US

  • 入库时间 2022-08-21 11:24:48

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号