Centrality-Based Approach for Supervised Term Weighting

机译：基于中心度的监督术语加权

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The huge amount of text documents has made the manual organization of text data a tedious task. Automatic text classification helps to easily handle the large number of documents by organising them automatically into predefined classes. The effectiveness and efficiency of automatic text classification largely depends on the way text documents are represented. A text document is usually viewed as a bag of terms (or words) and represented as a vector using the vector space model where terms are assumed unordered and independent and term frequencies (or weights) are used in the representation. Graphs are another text representation scheme that considers the structure of terms in the text document which is important for natural language. Terms weighted on the basis of graph representation increase the performance of text classification. In this paper, we present a novel approach for graph-based supervised term weighting which considers information relevant for the classification task using node centrality in the co-occurrence graphs built from the labelled training documents. Our experimental evaluation of the proposed term weighting scheme on four benchmark datasets shows the scheme has consistently superior performance over the state-of-the-art term weighting methods for text classification.

机译：大量的文本文档使手工组织文本数据成为一项繁琐的任务。自动文本分类通过将它们自动组织到预定义的类中，有助于轻松处理大量文档。自动文本分类的有效性和效率在很大程度上取决于表示文本文档的方式。通常使用矢量空间模型将文本文档视为术语（或单词）的包，并表示为矢量，其中假定术语是无序且独立的，并且表示中使用术语频率（或权重）。图形是另一种文本表示方案，它考虑了文本文档中术语的结构，这对于自然语言很重要。基于图表示法加权的术语可提高文本分类的性能。在本文中，我们提出了一种基于图的有监督术语加权的新方法，该方法考虑了使用从标记的训练文档建立的同现图中的节点中心性来考虑与分类任务相关的信息。我们对四个基准数据集上的术语加权方案进行的实验评估表明，该方案具有优于文本分类的最新术语加权方法的性能。

著录项

来源
《IEEE International Conference on Data Mining Workshops》|2016年|1261-1268|共8页
会议地点
作者
Niloofer Shanavas; Hui Wang; Zhiwei Lin; Glenn Hawe;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Weight measurement; Text mining; Time-frequency analysis; Syntactics; Natural languages;

机译：培训;体重测量;文本挖掘;时频分析;句法;自然语言;

相似文献

外文文献
中文文献
专利

1. Balancing between over-weighting and under-weighting in supervised term weighting [J] . Haibing Wu, Xiaodong Gu, Yiwei Gu Information Processing & Management . 2017,第2期

机译：有监督权重中权重过高与权重过低之间的平衡
2. Combining supervised term-weighting metrics for SVM text classification with extended term representation [J] . Haddoud Mounia, Mokhtari Aicha, Lecroq Thierry, Knowledge and information systems . 2016,第3期

机译：将用于SVM文本分类的监督术语权重度量与扩展术语表示相结合
3. Concept-based one-class SVM classifier with supervised term weighting scheme for imbalanced sentiment classification [J] . Khanista Namee, Jantima Polpinij Engineering and Applied Science Research . 2021,第5期

机译：基于概念的单级SVM分类器，具有监督术语加权方案，用于不平衡情绪分类
4. Centrality-Based Approach for Supervised Term Weighting [C] . Niloofer Shanavas, Hui Wang, Zhiwei Lin, IEEE International Conference on Data Mining Workshops . 2016

机译：基于中心的监督术语加权方法
5. A single document-based term weighting scheme by supporting terms. [D] . Cheng, Juan. 2006

机译：通过支持术语的单个基于文档的术语加权方案。
6. Centrality-based pathway enrichment: a systematic approach for finding significant pathways dominated by key genes [O] . Zuguang Gu, Jialin Liu, Kunming Cao, 2012

机译：基于中心性的途径富集：寻找关键基因主导的重要途径的系统方法
7. Credibility Adjusted Term Frequency: A Supervised Term Weighting Scheme for Sentiment Analysis and Text Classification [O] . Yoon Kim, Owen Zhang 2015

机译：可信度调整期限频率：用于情绪分析和文本分类的监督期限加权方案
8. Improve Precategorized Collection Retrieval by Using Supervised Term Weighting Schemes. [R] . Zhao, Y., Karypis, G. 2001

机译：利用监督期限加权方案改进预分类收集检索。

Centrality-Based Approach for Supervised Term Weighting

摘要

著录项

相似文献

相关主题

期刊订阅