首页> 外文期刊>Information Processing & Management >Adaptive sampling for thresholding in document filtering and classification
【24h】

Adaptive sampling for thresholding in document filtering and classification

机译:用于文档过滤和分类中的阈值的自适应采样

获取原文
获取原文并翻译 | 示例
       

摘要

Document filtering (DF) and document classification (DC) are often integrated together to classify suitable documents into suitable categories. A popular way to achieve integrated DF and DC is to associate each category with a threshold. A document d may be classified into a category c only if its degree of acceptance (DOA) with respect to c is higher than the threshold of c. Therefore, tuning a proper threshold for each category is essential. A threshold that is too high (low) may mislead the classifier to reject (accept) too many documents. Unfortunately, thresholding is often based on the classifier's DOA estimations, which cannot always be reliable, due to two common phenomena: (1) the DOA estimations made by the classifier cannot always be correct, and (2) not all documents may be classified without any controversy. Unreliable estimations are actually noises that may mislead the thresholding process. In this paper, we present an adaptive and parameter-free technique AS4T to sample reliable DOA estimations for thresholding. AS4T operates by adapting to the classifier's status, without needing to define any parameters. Experimental results show that, by helping to derive more proper thresholds, AS4T may guide various classifiers to achieve significantly better and more stable performances under different circumstances. The contributions are of practical significance for real-world integrated DF and DC. (c) 2004 Elsevier Ltd. All rights reserved.
机译:文档过滤(DF)和文档分类(DC)通常集成在一起,以将合适的文档分类为合适的类别。实现集成的DF和DC的一种流行方法是将每个类别与阈值相关联。仅当文档d对c的接受程度(DOA)高于c的阈值时,才可以将其分类为c类。因此,为每个类别调整适当的阈值至关重要。阈值过高(过低)可能会误导分类器拒绝(接受)太多文档。不幸的是,阈值化通常基于分类器的DOA估计,由于两个常见现象,该估计不能总是可靠的:(1)分类器所做的DOA估计不能总是正确的;(2)可能没有分类所有文档而没有任何争议。不可靠的估计实际上是可能误导阈值处理的噪声。在本文中,我们提出了一种自适应且无参数的技术AS4T,以对可靠的DOA估计进行采样以进行阈值化。 AS4T通过适应分类器的状态进行操作,而无需定义任何参数。实验结果表明,通过帮助推导更合适的阈值,AS4T可以指导各种分类器在不同情况下实现明显更好和更稳定的性能。这些贡献对于现实世界中集成的DF和DC具有实际意义。 (c)2004 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号