首页> 外文会议>IEEE International Conference on Data Mining Workshops >Centrality-Based Approach for Supervised Term Weighting
【24h】

Centrality-Based Approach for Supervised Term Weighting

机译:基于中心度的监督术语加权

获取原文

摘要

The huge amount of text documents has made the manual organization of text data a tedious task. Automatic text classification helps to easily handle the large number of documents by organising them automatically into predefined classes. The effectiveness and efficiency of automatic text classification largely depends on the way text documents are represented. A text document is usually viewed as a bag of terms (or words) and represented as a vector using the vector space model where terms are assumed unordered and independent and term frequencies (or weights) are used in the representation. Graphs are another text representation scheme that considers the structure of terms in the text document which is important for natural language. Terms weighted on the basis of graph representation increase the performance of text classification. In this paper, we present a novel approach for graph-based supervised term weighting which considers information relevant for the classification task using node centrality in the co-occurrence graphs built from the labelled training documents. Our experimental evaluation of the proposed term weighting scheme on four benchmark datasets shows the scheme has consistently superior performance over the state-of-the-art term weighting methods for text classification.
机译:大量的文本文档使手工组织文本数据成为一项繁琐的任务。自动文本分类通过将它们自动组织到预定义的类中,有助于轻松处理大量文档。自动文本分类的有效性和效率在很大程度上取决于表示文本文档的方式。通常使用矢量空间模型将文本文档视为术语(或单词)的包,并表示为矢量,其中假定术语是无序且独立的,并且表示中使用术语频率(或权重)。图形是另一种文本表示方案,它考虑了文本文档中术语的结构,这对于自然语言很重要。基于图表示法加权的术语可提高文本分类的性能。在本文中,我们提出了一种基于图的有监督术语加权的新方法,该方法考虑了使用从标记的训练文档建立的同现图中的节点中心性来考虑与分类任务相关的信息。我们对四个基准数据集上的术语加权方案进行的实验评估表明,该方案具有优于文本分类的最新术语加权方法的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号