首页> 外文会议>World multi-conference on systemics, cybernetics and informatics >Using Statistical Properties to Enhance Text Categorization
【24h】

Using Statistical Properties to Enhance Text Categorization

机译:使用统计属性来增强文本分类

获取原文

摘要

Statistical properties extracted from text are useful in many areas. Knowing who authored some text or knowing the category of a text is among the uses of collecting such statistics. In this paper, language-independent properties of text are studied using two categorized corpora of news articles. It is observed that the properties do not depend on the corpus or on its size. Several interesting properties are identified which enable minimizing the training set for an intelligent categorization system. Some other applications for such statistics could be to compare the information content and rate of new information between different corpora as well as to enhance the categorization of text.
机译:从文本中提取的统计特性在许多领域都很有用。知道谁撰写了一些文本或了解文本的类别是收集此类统计数据的用途。在本文中,使用两种分类的新闻文章的语言的文本的独立性属性。观察到性质不依赖于语料库或其尺寸。识别出几个有趣的属性,它能够最小化智能分类系统的培训集。此类统计数据的其他一些应用程序可以是比较不同的语料库之间的信息内容和新信息的速率,并增强文本的分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号