首页> 外文期刊>Journal of Intelligent Systems >Analysis of the Use of Background Distribution for Naive Bayes Classifiers
【24h】

Analysis of the Use of Background Distribution for Naive Bayes Classifiers

机译:幼稚贝叶斯分类器的背景分布使用分析

获取原文
获取原文并翻译 | 示例
           

摘要

The naive Bayes classifier is a popular classifier, as it is easy to train, requires no cross-validation for parameter tuning, and can be easily extended due to its generative model. Moreover, recently it was shown that the word probabilities (background distribution) estimated from large unlabeled corpora could be used to improve the parameter estimation of naive Bayes. However, previous methods do not explicitly allow to control how much the background distribution can influence the estimation of naive Bayes parameters. In contrast, we investigate an extension of the graphical model of naive Bayes such that a word is either generated from a background distribution or from a class-specific word distribution. We theoretically analyze this model and show the connection to Jelinek-Mercer smoothing. Experiments using four standard text classification data sets show that the proposed method can statistically significantly outperform previous methods that use the same background distribution.
机译:Naive Bayes Classifier是一个流行的分类器,因为它易于训练,不需要对参数调谐的交叉验证,并且由于其生成模型,可以轻松扩展。此外,最近显示从大型未标记的语料库估计的概率(背景分布)可用于改善幼稚贝叶斯的参数估计。然而,以前的方法没有明确允许控制背景分布可以影响幼稚贝叶斯参数的估计。相比之下,我们调查天真贝叶斯的图形模型的扩展,使得从背景分布或特定于类字分布中生成单词。理论上我们分析了此模型,并显示了与JELINEK-MEREDING的连接。使用四个标准文本分类数据集的实验表明,所提出的方法可以统计显着优于使用相同背景分布的先前方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号