首页> 外文会议>4th workshop on cognitive aspects of the lexicon >When Frequency Data Meet Dispersion Data in the Extraction of Multi-word Units from a Corpus: A Study of Trigrams in Chinese
【24h】

When Frequency Data Meet Dispersion Data in the Extraction of Multi-word Units from a Corpus: A Study of Trigrams in Chinese

机译:从语料库中提取多词单位时,当频率数据遇到色散数据时:中文的卦的研究

获取原文
获取原文并翻译 | 示例

摘要

One of the main approaches to extract multi-word units is the frequency threshold approach, but the way this approach considers dispersion data still leaves a lot to be desired. This study adopts Gries's (2008) dispersion measure to extract trigrams from a Chinese corpus, and the results are compared with those of the frequency threshold approach. It is found that the overlap between the two approaches is not very large. This demonstrates the necessity of taking dispersion data more seriously and the dynamic nature of lexical representations. Moreover, the trigrams extracted in the present study can be used in a wide range of language resources in Chinese.
机译:提取多字单元的主要方法之一是频率阈值方法,但是该方法考虑分散数据的方法仍然有很多不足之处。本研究采用格里斯(Gries,2008)的弥散量度从中国语料库中提取三字组,并将其结果与频率阈值法进行比较。发现两种方法之间的重叠不是很大。这表明必须更加重视散布数据以及词汇表述的动态性质。而且,本研究中提取的三字组可以在汉语的多种语言资源中使用。

著录项

  • 来源
  • 会议地点 Dublin(IE)
  • 作者

    Chan-Chia Hsu;

  • 作者单位

    Graduate Institute of Linguistics, National Taiwan University No. 1, Sec. 4, Roosevelt Road, Taipei, 10617 Taiwan (R.O.C);

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号