首页> 外文会议>International Conference on Artificial Intelligence IC-AI'2001 Vol.2, Jun 25-28, 2001, Las Vegas, Nevada, USA >Unsupervised Taxonomy of Large Document Corpora Utilizing Idiomatic Character of Natural Languages
【24h】

Unsupervised Taxonomy of Large Document Corpora Utilizing Idiomatic Character of Natural Languages

机译:利用自然语言的惯用特性的大文档语料库的无监督分类法

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a novel approach to unsupervised taxonomy. It is based on the idiomatic character of natural languages, rather then on statistical calculations. The term "idiomatic" is hereby utilized in two complementary senses: as an intersubjective agreement among the members of a speech community regarding the meaning of a phrase, or as an objective agreement that a particular message is customarily expressed by some particular phrase. The idiomatic character of natural languages makes it extremely likely that across entire document corpora similar ideas will be consistently expressed by some particular phrases. This allows main ideas of a corpus to be faithfully represented by a handful of idiomatic phrases, which can serve as a directory that significantly improves the navigation through the underlying corpus.
机译:本文提出了一种无监督分类法的新颖方法。它基于自然语言的惯用特性,而不是基于统计计算。因此,术语“惯用的”在两种互补的意义上被使用:作为语音社区的成员之间关于短语的含义的主体间协议,或者作为客观的协议,即特定消息通常由某个特定短语表达。自然语言的惯用性使得极有可能在整个文档库中,类似的想法将由某些特定的短语一致地表达。这使语料库的主要思想可以由少数惯用语来忠实地表示,这些惯用语可以用作目录,从而显着改善基础语料库的导航。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号