Document-Level Text Classification Using Single-Layer Multisize Filters Convolutional Neural Network

Akhter Muhammad Pervez; Jiangbin Zheng; Naqvi Irfan Raza; Abdelmajeed Mohammed; Mehmood Atif; Sadiq Muhammad Tariq

首页> 外文期刊>Quality Control, Transactions >Document-Level Text Classification Using Single-Layer Multisize Filters Convolutional Neural Network

【24h】

Document-Level Text Classification Using Single-Layer Multisize Filters Convolutional Neural Network

机译：使用单层多功能过滤器卷积神经网络的文档级文本分类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The rapid growth of electronic documents are causing problems like unstructured data that need more time and effort to search a relevant document. Text Document Classification (TDC) has a great significance in information processing and retrieval where unstructured documents are organized into pre-defined classes. Urdu is the most favorite research language in South Asian languages because of its complex morphology, unique features, and lack of linguistic resources like standard datasets. As compared to short text, like sentiment analysis, long text classification needs more time and effort because of large vocabulary, more noise, and redundant information. Machine Learning (ML) and Deep Learning (DL) models have been widely used in text processing. Despite the major limitations of ML models, like learn directed features, these are the favorite methods for Urdu TDC. To the best of our knowledge, it is the first study of Urdu TDC using DL model. In this paper, we design a large multi-purpose and multi-format dataset that contain more than ten thousand documents organize into six classes. We use Single-layer Multisize Filters Convolutional Neural Network (SMFCNN) for classification and compare its performance with sixteen ML baseline models on three imbalanced datasets of various sizes. Further, we analyze the effects of preprocessing methods on SMFCNN performance. SMFCNN outperformed the baseline classifiers and achieved 95.4%, 91.8%, and 93.3% scores of accuracy on medium, large and small size dataset respectively. The designed dataset would be publically and freely available in different formats for future research in Urdu text processing.

机译：电子文档的快速增长导致异构化数据等问题需要更多的时间和精力来搜索相关文档。文本文档分类（TDC）在信息处理和检索中具有重要意义，其中非结构化文档组织成预定义的类。乌尔都语是南亚语言中最喜欢的研究语言，因为其复杂的形态，独特的特征，以及缺乏标准数据集等语言资源。与短文本相比，如情绪分析，长文本分类需要更多的时间和努力，因为大量的词汇，更多的噪声和冗余信息。机器学习（ML）和深度学习（DL）模型已广泛用于文本处理。尽管ML模型的主要限制，但像学习指示功能一样，这是URDU TDC的最喜欢的方法。据我们所知，它是使用DL模型的URDU TDC研究。在本文中，我们设计了一个包含超过一万份文档的大型多功能和多格式数据集，组织成六个类。我们使用单层Multisize Filters卷积神经网络（SMFCNN）进行分类，并在三个不同尺寸的三个不平衡数据集中使用十六毫升基线模型进行比较。此外，我们分析了预处理方法对SMFCNN性能的影响。 SMFCNN分别表现出基线分类器，并分别在媒体，大型和小型数据集中实现了95.4％，91.8％和93.3％的精度。设计的数据集将以不同的格式公开，可自由地提供用于URDU文本处理的未来研究。

著录项

来源
《Quality Control, Transactions》 |2020年第2020期|42689-42707|共19页
作者
Akhter Muhammad Pervez; Jiangbin Zheng; Naqvi Irfan Raza; Abdelmajeed Mohammed; Mehmood Atif; Sadiq Muhammad Tariq;
展开▼
作者单位

Northwestern Polytech Univ Sch Software & Microelect Xian 710072 Peoples R China;

Northwestern Polytech Univ Sch Software & Microelect Xian 710072 Peoples R China;

Northwestern Polytech Univ Sch Software & Microelect Xian 710072 Peoples R China;

Northwestern Polytech Univ Sch Comp Sci & Technol Xian 710072 Peoples R China;

Xidian Univ Sch Artificial Intelligence Xian 710071 Peoples R China;

Northwestern Polytech Univ Sch Automat Xian 710072 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Convolutional neural network; deep learning; machine learning; natural language processing; text document classification; Urdu text classification;

机译：卷积神经网络;深入学习;机器学习;自然语言处理;文本文档分类;乌尔都语文本分类;

相似文献

外文文献
中文文献
专利

1. Combining attention-based bidirectional gated recurrent neural network and two-dimensional convolutional neural network for document-level sentiment classification [J] . Liu Fagui, Zheng Jingzhong, Zheng Lailei, Neurocomputing . 2020,第Jana2期

机译：结合基于注意力的双向门控递归神经网络和二维卷积神经网络进行文档级情感分类
2. A discourse-aware neural network-based text model for document-level text classification [J] . Lee Kangwook, Han Sanggyu, Myaeng Sung-Hyon Journal of Information Science . 2018,第6期

机译：基于话语感知的神经网络文本模型用于文档级文本分类
3. Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification [J] . Banerjee Imon, Ling Yuan, Chen Matthew C., Artificial intelligence in medicine . 2019,第JUNa期

机译：卷积神经网络（CNN）和递归神经网络（RNN）架构在放射学文本报告分类中的比较有效性
4. Filter pruning of Convolutional Neural Networks for text classification: A case study of cancer pathology report comprehension [C] . Hong-Jun Yoon, Sarah Robinson, J. Blair Christian, IEEE EMBS International Conference on Biomedical and Health Informatics . 2018

机译：用于文本分类的卷积神经网络的过滤修剪：以癌症病理报告理解为例
5. Investigating Noise Robustness of Convolutional Neural Networks for Image Classification Using Gabor Filters [D] . Jeong, Sangwon. 2020

机译：使用Gabor过滤器调查卷积神经网络的噪声稳健性
6. Automated design of a convolutional neural network with multi-scale filters for cost-efficient seismic data classification [O] . Zhi Geng, Yanfei Wang -1

机译：具有多尺度滤波器的卷积神经网络的自动化设计可实现经济高效的地震数据分类
7. Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification [O] . Imon Banerjee, Yuan Ling, Matthew C. Chen, 2019

机译：卷积神经网络（CNN）和反复性神经网络（RNN）架构对放射学文本报告分类的比较有效性

Document-Level Text Classification Using Single-Layer Multisize Filters Convolutional Neural Network

摘要

著录项

相似文献

相关主题

期刊订阅