首页>
外国专利>
ACTIVE LEARNING FOR LARGE-SCALE SEMI-SUPERVISED CREATION OF SPEECH RECOGNITION TRAINING CORPORA
ACTIVE LEARNING FOR LARGE-SCALE SEMI-SUPERVISED CREATION OF SPEECH RECOGNITION TRAINING CORPORA
展开▼
机译:积极学习大型半监督的语音识别培训公司的创建
展开▼
页面导航
摘要
著录项
相似文献
摘要
Techniques are disclosed for generating ASR training data. According to an embodiment, impactful ASR training corpora is generated efficiently, and the quality or relevance of ASR training corpora being generated is increased by leveraging knowledge of the ASR system being trained. An example methodology includes: selecting one of a word or phrase, based on knowledge and/or content of said ASR training corpora; presenting a textual representation of said word or phrase; receiving a speech utterance that includes said word or phrase; receiving a transcript for said speech utterance; presenting said transcript for review (to allow for editing, if needed); and storing said transcript and said audio file in an ASR system training database. The selecting may include, for instance, selecting a word or phrase that is under-represented in said database, and/or based upon an n-gram distribution on a language, and/or based upon known areas that tend to incur transcription mistakes.
展开▼