首页> 外文期刊>Information Processing & Management >A hybrid generative/discriminative approach to text classification with additional information
【24h】

A hybrid generative/discriminative approach to text classification with additional information

机译:带有附加信息的文本生成/区分混合方法

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a classifier for text data samples consisting of main text and additional components, such as Web pages and technical papers. We focus on multiclass and single-labeled text classification problems and design the classifier based on a hybrid composed of probabilistic generative and discriminative approaches. Our formulation considers individual component generative models and constructs the classifier by combining these trained models based on the maximum entropy principle. We use naive Bayes models as the component generative models for the main text and additional components such as titles, links, and authors, so that we can apply our formulation to document and Web page classification problems. Our experimental results for four test collections confirmed that our hybrid approach effectively combined main text and additional components and thus improved classification performance. (c) 2006 Published by Elsevier Ltd.
机译:本文为文本数据样本提供了一个分类器,该文本样本包含主要文本和其他组件,例如网页和技术论文。我们专注于多类和单标签文本分类问题,并基于由概率生成和区分方法组成的混合体来设计分类器。我们的公式考虑了单个组件的生成模型,并根据最大熵原理将这些经过训练的模型相结合,构造了分类器。我们将朴素的贝叶斯模型用作主要文本的组件生成模型,并使用其他组件(例如标题,链接和作者)作为组件生成模型,以便将我们的提法应用于文档和网页分类问题。我们针对四个测试集合的实验结果证实,我们的混合方法有效地结合了正文和其他组件,从而提高了分类性能。 (c)2006年由Elsevier Ltd.发布。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号