Automated big security text pruning and classification

机译：自动化大安全性文本修剪和分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many security related big data problems, including document, traffic, and system log analysis require analysis of unstructured text. Consider the task of analyzing company documents for secure storage. Some might be too sensitive to put on a public cloud and require private storage with associated backup overhead, some may safe on the cloud in encrypted form, and some may be sufficiently non-sensitive to be stored on the cloud in plain-text without encryption and decryption overhead. Being able to make such categorizations autonomously can significantly strengthen data security, organization, and storage efficiency. In this paper, we analyze several base machine learning based security risk assessment algorithms and develop techniques to improve upon standard algorithms. In particular, we examine labeling document sensitivity, labeling each paragraph in the document with one of three levels of security risk. For evaluation, we use real sensitive texts, from documents leaked by the WikiLeaks organization. We improve upon the base models using probabilistic topic modeling via Latent Dirichlet Analysis to identify samples from impure subtopics in the training set, prior to training a logistic regression classifier.

机译：许多与安全性相关的大数据问题，包括文档，流量和系统日志分析，都需要对非结构化文本进行分析。考虑分析公司文档以安全存储的任务。有些可能过于敏感而无法放置在公共云上，并且需要具有相关备份开销的私有存储，有些可能以加密形式在云上安全，而某些可能不够敏感以至于无需加密即可以纯文本格式存储在云中和解密开销。能够自动进行此类分类可以显着增强数据安全性，组织和存储效率。在本文中，我们分析了几种基于机器学习的安全风险评估算法，并开发了对标准算法进行改进的技术。特别是，我们检查标签文件的敏感性，为文件中的每个段落添加三个级别的安全风险之一。为了进行评估，我们使用了WikiLeaks组织泄露的文档中的真实敏感文本。在训练逻辑回归分类器之前，我们通过潜在Dirichlet分析使用概率主题建模对基础模型进行改进，以从训练集中的不纯副主题中识别出样本。

著录项

来源
《IEEE International Congress on Big Data》|2016年|3629-3637|共9页
会议地点
作者
Khudran Alzhrani; Ethan M. Rudd; C. Edward Chow; Terrance E. Boult;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Security; Cloud computing; Training; Big data; Sensitivity; Privacy; Companies;

机译：安全性;云计算;培训;大数据;敏感性;隐私;公司;

相似文献

外文文献
中文文献
专利

1. Multidimensional Text Warehousing for Automated Text Classification [J] . Jiyun Kim, Han-joon Kim Journal of information technology research . 2018,第2期

机译：用于自动文本分类的多维文本仓库
2. Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users [J] . Shatkay Hagit, Pan Fengxia, Rzhetsky Andrey, Bioinformatics . 2008,第18期

机译：生物医学文本的多维分类：向各种用户提供自动化，实用的高实用性文本
3. Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users [J] . Hagit Shatkay, Fengxia Pan, Andrey Rzhetsky, Bioinformatics . 2008,第18期

机译：生物医学文本的多维分类：向各种用户提供自动化，实用的高实用性文本
4. Automated Big Security Text Pruning and Classification [C] . Khudran Alzhrani, Ethan M. Rudd, C. Edward Chow, IEEE International Conference on Big Data . 2016

机译：自动大安全文本修剪和分类
5. Towards Automating Big Texts Security Classification [D] . Alzhrani, Khudran Maeed. 2018

机译：迈向自动化大文本安全分类
6. Multi-dimensional classification of biomedical text: Toward automated practical provision of high-utility text to diverse users [O] . Hagit Shatkay, Fengxia Pan, Andrey Rzhetsky, -1

机译：生物医学文本的多维分类：向各种用户提供自动化实用的高实用性文本
7. Automated U.S Diplomatic Cables Security Classification: Topic Model Pruning vs. Classification Based on Clusters [O] . Alzhrani, Khudran, Rudd, Ethan M., Chow, C. Edward, 2017

机译：自动美国外交电缆安全分类：主题模型基于聚类的剪枝与分类
8. Security Classification Using Automated Learning (SCALE): Optimizing Statistical Natural Language Processing Techniques to Assign Security Labels to Unstructured Text [R] . Brown, J. D., Charlebois, D. 2010

机译：使用自动学习的安全性分类（sCaLE）：优化统计自然语言处理技术，将安全标签分配给非结构化文本

Automated big security text pruning and classification

摘要

著录项

相似文献

相关主题

期刊订阅