首页> 外文会议>International Symposium on Communications and Information Technologies >A deep learning-based RNNs model for automatic security audit of short messages
【24h】

A deep learning-based RNNs model for automatic security audit of short messages

机译:基于深度学习的RNN模型用于短信的自动安全审核

获取原文

摘要

The traditional text classification methods usually follow this process: first, a sentence can be considered as a bag of words (BOW), then transformed into sentence feature vector which can be classified by some methods, such as maximum entropy (ME), Naive Bayes (NB), support vector machines (SVM), and so on. However, when these methods are applied to text classification, we usually can not obtain an ideal result. The most important reason is that the semantic relations between words is very important for text categorization, however, the traditional method can not capture it. Sentiment classification, as a special case of text classification, is binary classification (positive or negative). Inspired by the sentiment analysis, we use a novel deep learning-based recurrent neural networks (RNNs)model for automatic security audit of short messages from prisons, which can classify short messages(secure and non-insecure). In this paper, the feature of short messages is extracted by word2vec which captures word order information, and each sentence is mapped to a feature vector. In particular, words with similar meaning are mapped to a similar position in the vector space, and then classified by RNNs. RNNs are now widely used and the network structure of RNNs determines that it can easily process the sequence data. We preprocess short messages, extract typical features from existing security and non-security short messages via word2vec, and classify short messages through RNNs which accept a fixed-sized vector as input and produce a fixed-sized vector as output. The experimental results show that the RNNs model achieves an average 92.7% accuracy which is higher than SVM.
机译:传统的文本分类方法通常遵循以下过程:首先,可以将一个句子视为单词袋(BOW),然后将其转换为可以通过某些方法分类的句子特征向量,例如最大熵(ME),朴素贝叶斯(Naive Bayes) (NB),支持向量机(SVM)等。但是,将这些方法应用于文本分类时,通常无法获得理想的结果。最重要的原因是单词之间的语义关系对于文本分类非常重要,但是传统方法无法捕获它。作为文本分类的一种特殊情况,情感分类是二进制分类(正数或负数)。受到情感分析的启发,我们使用一种新颖的基于深度学习的递归神经网络(RNN)模型对监狱中的短信进行自动安全审核,从而可以对短信(安全和非不安全)进行分类。本文利用word2vec提取短消息的特征,捕获词序信息,并将每个句子映射到特征向量。特别是,具有相似含义的单词将映射到向量空间中的相似位置,然后通过RNN进行分类。 RNN现在已被广泛使用,并且RNN的网络结构决定了它可以轻松地处理序列数据。我们对短信进行预处理,通过word2vec从现有安全和非安全短信中提取典型特征,并通过RNN对短信进行分类,这些RNN接受固定大小的向量作为输入,并生成固定大小的向量作为输出。实验结果表明,RNNs模型的平均准确率达到92.7%,高于SVM。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号