Automatic Extraction of Phonetically Rich Sentences from Large Text Corpus of Indian Languages

机译：从印度语大文本语料库中自动提取语音丰富的句子

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

A set of phonetically rich sentences is a requirement for representing different speech units, to be used for developing Automatic Speech Recognition and Speech Synthesis Systems. Selecting such a set from a large text corpus without modifying the characteristics of the corpus is still a difficult task. A major concern in this process is to decide on what basis sentences must be chosen so that it covers all phonetic aspects of the language under study in a minimum possible size. This paper describes a simple process of automatically extracting such set of sentences from a large text corpus of a given Indian Language and also presents an algorithm for the process. The process discussed in this paper is language independent and works for most of the Indian Languages. The extent of success, in terms of phonetic richness of the sentences, achieved in the process is also discussed.

机译：一组语音丰富的句子是表示不同语音单元的要求，可用于开发自动语音识别和语音合成系统。从大型文本语料库中选择这样的集合而不修改语料库的特征仍然是困难的任务。在此过程中，一个主要的问题是决定必须选择什么基础的句子，以便以最小的大小覆盖所研究语言的所有语音方面。本文介绍了一种从给定印度语言的大型文本语料库中自动提取此类句子的简单过程，并提出了一种用于该过程的算法。本文讨论的过程与语言无关，并且适用于大多数印度语言。还讨论了在此过程中所获得的成功程度，以句子的语音丰富性为依据。

著录项

来源
《International Conference on Spoken Language Processing; 20041004-08; Jeju(KR)》|2004年|P.2885-2888|共4页
会议地点 Jeju(KR)
作者
Karunesh Arora; Sunita Arora; Kapil Verma; S. S. Agrawal;
展开▼
作者单位

Natural Language Processing Division, Centre for Development of Advanced Computing, Noida;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类应用语言学;
关键词

相似文献

外文文献
中文文献
专利

1. Text Search of Surnames in Some Slavic and Other Morphologically Rich Languages Using Rule Based Phonetic Algorithms [J] . Zahoransky Dusan, Polasek Ivan Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2015,第3期

机译：使用基于规则的语音算法在某些斯拉夫语和其他形态丰富的语言中的姓氏文本搜索
2. Arabic Speaker-Independent Continuous Automatic Speech Recognition Based on a Phonetically Rich and Balanced Speech Corpus [J] . Mohammad Abushariah, Raja Ainon, Roziati Zainuddin, The international arab journal of information technology . 2012,第1期

机译：基于语音丰富均衡的语料库的阿拉伯语独立于说话人的连续自动语音识别
3. A Review on Speech Corpus Development for Automatic Speech Recognition in Indian Languages [J] . Cini kurian International Journal of Advanced Networking and Applications . 2015,第7018期

机译：语音语料库在印度语言中自动语音识别的发展述评
4. Structural analysis of Hindi phonetics and a method for extraction of phonetically rich sentences from a very large Hindi text corpus [C] . Shrikant Malviya, Rohit Mishra, Uma Shanker Tiwary 2016 Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Technique . 2016

机译：印地语语音的结构分析和从非常大的印地语文本语料库中提取语音丰富的句子的方法
5. Automatic Text Summarization Using Importance of Sentences for Email Corpus. [D] . Nadella, Sravan. 2015

机译：使用句子对电子邮件语料库的重要性自动进行文本摘要。
6. Perceptual Doping: An Audiovisual Facilitation Effect on Auditory Speech Processing From Phonetic Feature Extraction to Sentence Identification in Noise [O] . Shahram Moradi, Björn Lidestam, Elaine Hoi Ning Ng, -1

机译：知觉兴奋剂：从语音特征提取到噪声中的句子识别对听觉语音处理的视听促进作用
7. Structural Analysis of Hindi Phonetics and A Method for Extraction of Phonetically Rich Sentences from a Very Large Hindi Text Corpus [O] . Malviya, Shrikant, Mishra, Rohit, Tiwary, Uma Shanker 2017

机译：印地语语音的结构分析及其提取方法来自非常大的印地语文本语料库的发音丰富的句子
8. Automatic Translation of English Text to Phonetics by Means of Letter-to-Sound Rules. [R] . elovitz,honey sue johnson,rodney w. 1976

机译：用声信规则自动将英文文本翻译成语音。

Automatic Extraction of Phonetically Rich Sentences from Large Text Corpus of Indian Languages

摘要

著录项

相似文献

相关主题

期刊订阅