首页> 外文会议>IEEE Intl Conf on Ubiquitous Computing amp;amp;amp;amp;amp;amp; Communications >URL2Vec: URL Modeling with Character Embeddings for Fast and Accurate Phishing Website Detection
【24h】

URL2Vec: URL Modeling with Character Embeddings for Fast and Accurate Phishing Website Detection

机译:URL2VEC:使用字符嵌入式的URL建模,用于快速准确的网络钓鱼网站检测

获取原文

摘要

A deep learning-based approach to phishing detection is proposed. Specifically, websites' URLs and the characters in these URLs are mapped to documents and words, respectively, in the context of word2vec-based word embedding learning. Consequently, character embedding can be achieved from a corpus of URLs in an unsupervised manner. Furthermore, we combine character embedding with the structures of URLs to obtain the vector representations of the URLs. In particular, an URL is partitioned into the following five sections: URL protocol, sub-domain name, domain name, domain suffix, and URL path. To identify the phishing URLs, existing classification algorithms can be used smoothly on the vector representations of the URLs, avoiding laborious work on designing effective features manually and empirically. For evaluations, we collect a large-scale dataset, i.e., 1 Million Phishing Detection Dataset (1M-PD), which has been released for public use. Extensive experiments conducted on two real-world datasets show the effectiveness of the proposed approach, which achieves an accuracy of 99.69% with 0.40% false positive and 99.79% true positives on the 1M-PD dataset. In particular, the proposed approach detects each URL in 32ms on average merely on a personal computer, which is much faster than existing approaches and even can be considered real-time.
机译:提出了基于深入的学习的网络钓鱼检测方法。具体而言,在基于Word2Vec的Word嵌入学习的上下文中,这些URL中的网站URL和这些URL中的字符分别映射到文档和单词。因此,可以以无监督的方式从URL的语料库实现字符嵌入。此外,我们将字符嵌入与URL的结构组合以获得URL的矢量表示。特别是,URL被分区为以下五个部分:URL协议,子域名,域名,域后缀和URL路径。为了识别网络钓鱼URL,现有的分类算法可以在URL的矢量表示上顺利使用,避免手动和经验设计有效特征的艰苦工作。对于评估,我们收集了一个大规模的数据集,即100万个网络钓鱼检测数据集(1M-PD),已被释放用于公共使用。在两个现实世界数据集中进行的广泛实验表明了所提出的方法的有效性,该方法在1M-PD数据集上实现了0.40 %的误报和99.79 %的99.79 %真正阳性。特别是,所提出的方法仅在32ms中仅在个人计算机上检测32ms的每个URL,这比现有方法快得多,甚至可以是实时的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号