...
首页> 外文期刊>ACM Transactions on Management Information Systems >Using Word Embeddings to Deter Intellectual Property Theft through Automated Generation of Fake Documents
【24h】

Using Word Embeddings to Deter Intellectual Property Theft through Automated Generation of Fake Documents

机译:使用Word Embeddings通过自动生成假文件来阻止知识产权盗用

获取原文
获取原文并翻译 | 示例
           

摘要

Theft of intellectual property is a growing problem-one that is exacerbated by the fact that a successful compromise of an enterprise might only become known months after the hack. A recent solution called FORGE addresses this problem by automatically generating N "fake" versions of any real document so that the attacker has to determine which of the N + 1 documents that they have exfiltrated from a compromised network is real. In this article, we remove two major drawbacks in FORGE: (ⅰ) FORGE requires ontologies in order to generate fake documents-however, in the real world, ontologies, especially good ontologies, are infrequently available. The WE-FORGE system proposed in this article completely eliminates the need for ontologies by using distance metrics on word embeddings instead, (ⅱ) FORGE generates fake documents by first identifying "target" concepts in the original document and then substituting "replacement" concepts for them. However, we will show that this can lead to sub-optimal results (e.g., as target concepts are selected without knowing the availability and/or quality of the replacement concepts, they can sometimes lead to poor results). Our WE-FORGE system addresses this problem in two possible ways by performing a joint optimization to select concepts and replacements simultaneously. We conduct a human study involving both computer science and chemistry documents and show that WE-FORGE successfully deceives adversaries.
机译:盗窃知识产权是一种不断增长的问题 - 一种恶化的问题,即企业的成功妥协可能只在黑客之后几个月内变为已知的问题。最近一个名为Forge的解决方案通过自动生成任何实际文档的N个“假”版本来解决此问题,以便攻击者确定它们已从受损网络中删除的N + 1文档中的哪一个是真实的。在本文中,我们删除了伪造的两个主要缺点:(Ⅰ)Forge需要本体,以便在现实世界中,在现实世界中,尤其是良好的本体,不经常可用。本文中提出的We-Forge系统通过使用Word Embeddings上的距离度量来完全消除对本体的需求,(Ⅱ)伪造通过首先识别原始文档中的“目标”概念,然后代替“替换”概念来生成假文件他们。但是,我们将表明这可以导致次优效果(例如,因为在不知道替换概念的可用性和/或质量的情况下选择目标概念,它们有时会导致结果不佳)。我们的We-Forge系统通过执行联合优化同时选择概念和更换,以两种可能的方式解决了此问题。我们开展涉及计算机科学和化学文件的人类研究,并表明We-Forge成功地欺骗了对手。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号