【24h】

Word N-Gram Based Classification for Data Leakage Prevention

机译:基于单词的数据泄漏预防分类

获取原文

摘要

Revealing sensitive data to unauthorised personal is a serious problem to many organizations that can lead to devastating consequences. Traditionally, prevention of data leak was achieved through firewalls, VPNs and IDS, but without much consideration to sensitivity of the data. In recent years, new technologies such as data leakage prevention systems (DLPs) are developed, especially to either identify and protect sensitive data or monitor and detect sensitive data leakage. One of the most popular approaches used in DLPs is content analysis, where the content of exchanged documents, stored data or even network traffic is monitored for sensitive data. Contents of documents are examined using mainly text analysis and text clustering methods. Moreover, text analysis can be performed using methods such as pattern recognition, style variation and N-gram frequency. In this paper, we investigate the use of N-grams for data classification purposes. Our method is based on using the N-grams frequency to classify documents in order to detect and prevent leakage of sensitive data. We have studied the effectiveness of N-grams to measure the similarity between regular documents and existing classified documents.
机译:向未经授权的个人泄露敏感数据是许多可能导致破坏性后果的组织的严重问题。传统上,通过防火墙,VPN和IDS实现了数据泄漏的预防,但没有考虑到数据的敏感性。近年来,开发了新技术,如数据泄漏防护系统(DLP),尤其是识别和保护敏感数据或监视器并检测敏感数据泄漏。 DLP中使用的最流行的方法之一是内容分析,其中监测交换文档,存储数据甚至网络流量的内容以用于敏感数据。使用主要是文本分析和文本聚类方法检查文档的内容。此外,可以使用诸如模式识别,样式变化和N-GR频率等方法来执行文本分析。在本文中,我们调查了N-GRAM的使用以进行数据分类目的。我们的方法基于使用N-GRAMS频率对文档进行分类,以便检测和防止敏感数据泄漏。我们研究了n-gram的有效性,以衡量常规文件与现有分类文件之间的相似性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号