Word N-Gram Based Classification for Data Leakage Prevention

机译：基于单词的数据泄漏预防分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Revealing sensitive data to unauthorised personal is a serious problem to many organizations that can lead to devastating consequences. Traditionally, prevention of data leak was achieved through firewalls, VPNs and IDS, but without much consideration to sensitivity of the data. In recent years, new technologies such as data leakage prevention systems (DLPs) are developed, especially to either identify and protect sensitive data or monitor and detect sensitive data leakage. One of the most popular approaches used in DLPs is content analysis, where the content of exchanged documents, stored data or even network traffic is monitored for sensitive data. Contents of documents are examined using mainly text analysis and text clustering methods. Moreover, text analysis can be performed using methods such as pattern recognition, style variation and N-gram frequency. In this paper, we investigate the use of N-grams for data classification purposes. Our method is based on using the N-grams frequency to classify documents in order to detect and prevent leakage of sensitive data. We have studied the effectiveness of N-grams to measure the similarity between regular documents and existing classified documents.

机译：向未经授权的个人泄露敏感数据是许多可能导致破坏性后果的组织的严重问题。传统上，通过防火墙，VPN和IDS实现了数据泄漏的预防，但没有考虑到数据的敏感性。近年来，开发了新技术，如数据泄漏防护系统（DLP），尤其是识别和保护敏感数据或监视器并检测敏感数据泄漏。 DLP中使用的最流行的方法之一是内容分析，其中监测交换文档，存储数据甚至网络流量的内容以用于敏感数据。使用主要是文本分析和文本聚类方法检查文档的内容。此外，可以使用诸如模式识别，样式变化和N-GR频率等方法来执行文本分析。在本文中，我们调查了N-GRAM的使用以进行数据分类目的。我们的方法基于使用N-GRAMS频率对文档进行分类，以便检测和防止敏感数据泄漏。我们研究了n-gram的有效性，以衡量常规文件与现有分类文件之间的相似性。

著录项

来源
《IEEE International Conference on Trust, Security and Privacy in Computing and Communications》|2013年||共8页
会议地点
作者
Alneyadi Sultan; Sithirasenan Elankayer; Muthukkumarasamy Vallipuram;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类安全保密;
关键词
Data leakage prevention; N-gram profiles; N-grams; matching distance;

机译：数据泄漏预防;n-gram型材;n-gram;匹配距离;

相似文献

外文文献
中文文献
专利

1. Improving Bag-of-Visual-Words model using visual n-grams for human action classification [J] . Hernandez-Garcia Ruber, Ramos-Cozar Julian, Guil Nicolas, Expert Systems with Application . 2018,第feba期

机译：使用视觉n-gram改进视觉词袋模型以进行人类动作分类
2. How Data Classification Helps Improve Data Leakage Prevention [J] . Vishal Bindra PC Quest . 2014,第juna期

机译：数据分类如何帮助改善数据泄漏防护
3. Dealing with Out-of vocabulary Words and Filled Pauses in Word N-gram Based Speech Recognition System [J] . ATSUHIKO KAI, YOSHIFUMI HIROSE, SEIICHI NAKAGAWA 情報処理学会論文誌 . 1999,第4期

机译：基于单词N-gram的语音识别系统处理词汇外单词和填充的暂停
4. Word N-Gram Based Classification for Data Leakage Prevention [C] . Alneyadi Sultan, Sithirasenan Elankayer, Muthukkumarasamy Vallipuram 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications . 2013

机译：基于Word N-Gram的数据泄漏预防分类
5. Maximal Information Leakage Based Data-Driven Models for Privacy in Data Disclosure Applications [D] . Xiao, Tianrui. 2019

机译：基于最大信息泄露的数据驱动模型在数据披露应用中的隐私
6. Words prediction based on N-gram model for free-text entry in electronic health records [O] . Azita Yazdani, Reza Safdari, Ali Golkar, 2019

机译：基于N-GRAM模型的电子健康记录中自由文本输入的单词预测
7. Word N-gram Based Classification for Data Leakage Prevention [O] . Sultan Alneyadi, Elankayer Sithirasenan, Vallipuram Muthukkumarasamy 2016

机译：基于Word N-gram的数据泄漏防护分类
8. Preliminary Statistical Investigation into the Impace of an N-Gram Analysis Approach Based on World Syntactic Categories Toward Text Author Classification [R] . Diab, M. , Schuster, J. , Bock, P. 2000

机译：基于世界句法范畴对文本作者分类的N-gram分析方法的初步统计研究

Word N-Gram Based Classification for Data Leakage Prevention

摘要

著录项

相似文献

相关主题

期刊订阅