首页> 外国专利> Method and apparatus for measuring the degree of polysemy in polysemous words

Method and apparatus for measuring the degree of polysemy in polysemous words

机译:用于测量多义词中多义程度的方法和设备

摘要

A system and apparatus are disclosed for identifying polysemous terms and for measuring their degree of polysemy. A polysemy index provides a quantitative measure of how polysemous a word is. A list of words can be ranked by their polysemy indices, with the most polysemous words appearing at the top of the list. A polysemy evaluation process collects a set of terms near a target term. Inter-term distances of the set of terms occurring near the target term are computed and the multi-dimensional distance space is reduced to two dimensions. The two dimensional representation is converted into radial coordinates. Isotonic/antitonic regression techniques are used to compute the degree to which the distribution deviates from unimodality. The amount of deviation is the polysemy index. A corpus can be preprocessed using the polysemy indices to identify words having clearly separated senses, allowing an information retrieval system to return a separate list of documents for each sense of a word. Self-organizing sense disambiguation techniques can use the polysemy indixces to select canonical contexts for the various senses identified for a given word. Contexts are selected containing terms falling in radial bins near each peak. Such contexts can then be used for subsequent training of a classifier.
机译:公开了一种用于识别多义词和用于测量其多义程度的系统和设备。多义性索引提供了一个单词的多义性的定量度量。单词列表可以按其多义性索引进行排序,最多义的单词出现在列表的顶部。一词多义评估过程收集目标词附近的一组词。计算在目标项附近出现的一组项的项间距离,并将多维距离空间缩小为二维。二维表示被转换为径向坐标。等渗/反渗回归技术用于计算分布偏离单峰的程度。偏差量是多义性指数。可以使用多义性索引对语料库进行预处理,以识别具有明显分离的意义的单词,从而使信息检索系统可以针对单词的每种意义返回单独的文档列表。自组织的语义歧义消除技术可以使用多义词索引为针对给定单词识别的各种意义选择规范上下文。选择包含落入每个峰值附近的径向条中的项的上下文。然后可以将此类上下文用于分类器的后续训练。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号