首页> 外文会议>22nd International Conference on Computational Linguistics >Extending a Thesaurus with Words from Pan-Chinese Sources
【24h】

Extending a Thesaurus with Words from Pan-Chinese Sources

机译:泛汉语词库扩展词库

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we work on extending a Chinese thesaurus with words distinctly used in various Chinese communities. The acquisition and classification of such region-specific lexical items is an important step toward the larger goal of constructing a Pan-Chinese lexical resource. In particular, we extend a previous study in three respects: (1) to improve automatic classification by removing duplicated words from the thesaurus, (2) to experiment with classifying words at the subclass level and semantic head level, and (3) to further investigate the possible effects of data heterogeneity between the region-specific words and words in the thesaurus on classification performance. Automatic classification was based on the similarity between a target word and individual categories of words in the thesaurus, measured by the cosine function. Experiments were done on 120 target words from four regions. The automatic classification results were evaluated against a gold standard obtained from human judgements. In general accuracy reached 80% or more with the top 10 (out of 80+) and top 100 (out of 1,300+) candidates considered at the subclass level and semantic head level respectively, provided that the appropriate data sources were used.
机译:在本文中,我们致力于使用在各种华人社区中明显使用的词来扩展中文词库。此类特定于地区的词汇项目的获取和分类是朝着建立泛汉语词汇资源这一更大目标迈出的重要一步。特别是,我们将先前的研究扩展到三个方面:(1)通过从同义词库中删除重复的单词来改进自动分类;(2)在子类级别和语义头级别上对单词进行分类的实验;(3)进一步研究特定区域词和词库中词之间的数据异质性对分类性能的可能影响。自动分类是基于目标词与同义词库中词的各个类别之间的相似度(通过余弦函数衡量)。对来自四个地区的120个目标词进行了实验。自动分类结果根据从人类判断中获得的黄金标准进行评估。通常,如果使用了适当的数据源,则在子类级别和语义头级别分别考虑前10个(80个以上)和前100个(1300个以上)候选者,其准确性达到80%或更高。

著录项

  • 来源
  • 会议地点 Manchester(GB);Manchester(GB)
  • 作者

    Oi Yee Kwong; Benjamin K. Tsou;

  • 作者单位

    Department of Chinese, Translation and Linguistics City University of Hong Kong Tat Chee Avenue, Kowloon, Hong Kong Language Information Sciences Research Centre City University of Hong Kong Tat Chee Avenue, Kowloon, Hong Kong;

    Department of Chinese, Translation and Linguistics City University of Hong Kong Tat Chee Avenue, Kowloon, Hong Kong;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 程序设计、软件工程;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号