Term expansion on the categorization of summarized documents

Wen-Feng Hsiao; Te-Min Chang

首页> 外文期刊>International Journal of Computer Systems Science & Engineering >Term expansion on the categorization of summarized documents

【24h】

Term expansion on the categorization of summarized documents

机译：摘要文件分类的术语扩展

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Several researches emphasize on the application of using documents' summaries as feature vector inputs in text categorization tasks. The performance of this kind of approach is often poor when the coverage rate of summarization is low with oversimplified feature vectors. Therefore, in this study we propose to expand the terms in the summaries using supervised distributional clustering method to improve the categorization performance. In the training stage of our approach, we input documents' summaries to generate classifiers (KNN and Naive Bayes) and term clusters (using KL divergence as dissimilarity measure) as well. In the test stage, we classify a new document by inputting its expanded feature vector of its summary into the generated classifiers. That is, terms in the feature vector will be expanded using related terms in the same cluster in order to alleviate the term mismatch problem. Three experiments are conducted accordingly. The results show that our proposed approach can effectively resolve the problem of term mismatch problem and improve the categorization accuracy. In a word, our approach makes the idea of using automatic summarization to replace for the feature selection in text categorization tasks more practical and feasible.

机译：一些研究强调在文本分类任务中使用文档摘要作为特征向量输入的应用。当摘要的覆盖率较低且特征向量过于简单时，这种方法的性能通常很差。因此，在这项研究中，我们建议使用监督分布聚类方法来扩展摘要中的术语，以提高分类性能。在我们方法的训练阶段，我们还输入文档摘要以生成分类器（KNN和朴素贝叶斯）和术语聚类（使用KL散度作为相异性度量）。在测试阶段，我们通过将摘要的扩展特征向量输入到生成的分类器中来对新文档进行分类。也就是说，将使用同一聚类中的相关术语来扩展特征向量中的术语，以缓解术语不匹配问题。相应地进行了三个实验。结果表明，本文提出的方法可以有效地解决术语不匹配问题，提高分类的准确性。简而言之，我们的方法使使用自动摘要代替文本分类任务中的特征选择的想法更加实用和可行。

著录项

来源
《International Journal of Computer Systems Science & Engineering》 |2013年第4期|259-268|共10页
作者
Wen-Feng Hsiao; Te-Min Chang;
展开▼
作者单位

Department of Information Management, National PingTung Institute of Commerce;

Department of Information Management, National Sun Yat-sen University;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
distributional clustering; term expansion; text summarization; document categorization;

机译：分布聚类任期扩展文本摘要;文件分类;

相似文献

外文文献
中文文献
专利

1. Keyword Extraction Based Summarization of Categorized Kannada Text Documents [J] . Jayashree.R, Srikanta Murthy.K, Sunny.K International Journal on Soft Computing . 2011,第4期

机译：基于关键词提取的卡纳达语文本文档摘要
2. A semantic approach to extractive multi-document summarization: Applying sentence expansion for tuning of conceptual densities [J] . Mohammad Bidoki, Mohammad R. Moosavi, Mostafa Fakhrahmad Information Processing & Management . 2020,第6期

机译：提取多文件摘要的语义方法：应用句子扩张调整概念密度
3. Using Query Expansion In Graph-based Approach For Query-focused Multi-document Summarization [J] . Lin Zhao, Lide Wu, Xuanjing Huang Information Processing & Management . 2009,第1期

机译：在基于图的方法中使用查询扩展进行以查询为中心的多文档摘要
4. Single Document Summarization with Document Expansion [C] . Xiaojun Wan, Jianwu Yang AAAI Conference on Artificial Intelligence(AAAI-07); Innovative Applications of Artificial Intelligence Conference(IAAI-07); 20070722-26; 20070722-26; Vancouver(CA); Vancouver(CA) . 2007

机译：具有文档扩展功能的单文档摘要
5. Multi-document Summarization Based on Document Clustering and Neural Sentence Fusion [D] . Fuad, Tanvir Ahmed. 2018

机译：基于文档聚类和神经句子融合的多文件摘要
6. Extractive single document summarization using binary differential evolution: Optimization of different sentence quality measures [O] . Naveen Saini, Sriparna Saha, Dhiraj Chakraborty, 2019

机译：采用二元差分演进的提取单一文件摘要：不同句子质量措施的优化
7. Single document summarization with document expansion [O] . Xiaojun Wan, Jianwu Yang 2007

机译：单文档摘要与文档扩展

Term expansion on the categorization of summarized documents

摘要

著录项

相似文献

相关主题

期刊订阅