...
首页> 外文期刊>Journal of applied statistics >Zero-inflated beta distribution applied to word frequency and lexical dispersion in corpus linguistics
【24h】

Zero-inflated beta distribution applied to word frequency and lexical dispersion in corpus linguistics

机译:零充气的β分布应用于语料库语言学中的词频和词汇分散

获取原文
获取原文并翻译 | 示例
           

摘要

Corpus linguistics is the study of language as expressed in a body of texts or documents. The relative frequency of a word within a text and the dispersion of the word across the collection of texts provide information about the word's prominence and diffusion, respectively. In practice, people tend to use a relatively small number of words in a language's inventory of words and thus a large number of words in the lexicon are rarely employed. The zero-inflated beta distribution enables one to model the relative frequency of a word in a text since some texts may not even contain the word under study. In this paper, the expectation of a word's prominence and dispersion are defined under the zero-inflated beta model. Estimates of a word's prominence and dispersion are computed for words in the British National Corpus 1994 (BNC), a 100 million word collection of written and spoken language of a wide range of British English. The relationship between a word's prominence and dispersion is discussed as well as measures that are functions of both prominence and dispersion.
机译:语料库语言学是在文本或文件的身体中表达的语言的研究。文本中的文本内的单词的相对频率和文本集合的单词的色散分别提供了关于单词突出和扩散的信息。在实践中,人们倾向于在语言的单词库存中使用相对少量的单词,因此很少使用词汇中的大量单词。零充气的Beta分布使得能够在文本中模拟一个单词的相对频率,因为某些文本甚至可能无法包含在研究中的单词。在本文中,期望单词的突出和分散在零充气的β模型下定义。在英国国家科珀斯1994年(BNC)中的单词计算了一词的突出和分散的估计,这是一系列英国英语的一系列书面和口语语言的1亿字。讨论了单词突出和分散之间的关系以及占地和分散的措施。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号