URL2Vec: URL Modeling with Character Embeddings for Fast and Accurate Phishing Website Detection

机译：URL2VEC：使用字符嵌入式的URL建模，用于快速准确的网络钓鱼网站检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A deep learning-based approach to phishing detection is proposed. Specifically, websites' URLs and the characters in these URLs are mapped to documents and words, respectively, in the context of word2vec-based word embedding learning. Consequently, character embedding can be achieved from a corpus of URLs in an unsupervised manner. Furthermore, we combine character embedding with the structures of URLs to obtain the vector representations of the URLs. In particular, an URL is partitioned into the following five sections: URL protocol, sub-domain name, domain name, domain suffix, and URL path. To identify the phishing URLs, existing classification algorithms can be used smoothly on the vector representations of the URLs, avoiding laborious work on designing effective features manually and empirically. For evaluations, we collect a large-scale dataset, i.e., 1 Million Phishing Detection Dataset (1M-PD), which has been released for public use. Extensive experiments conducted on two real-world datasets show the effectiveness of the proposed approach, which achieves an accuracy of 99.69% with 0.40% false positive and 99.79% true positives on the 1M-PD dataset. In particular, the proposed approach detects each URL in 32ms on average merely on a personal computer, which is much faster than existing approaches and even can be considered real-time.

机译：提出了基于深入的学习的网络钓鱼检测方法。具体而言，在基于Word2Vec的Word嵌入学习的上下文中，这些URL中的网站URL和这些URL中的字符分别映射到文档和单词。因此，可以以无监督的方式从URL的语料库实现字符嵌入。此外，我们将字符嵌入与URL的结构组合以获得URL的矢量表示。特别是，URL被分区为以下五个部分：URL协议，子域名，域名，域后缀和URL路径。为了识别网络钓鱼URL，现有的分类算法可以在URL的矢量表示上顺利使用，避免手动和经验设计有效特征的艰苦工作。对于评估，我们收集了一个大规模的数据集，即100万个网络钓鱼检测数据集（1M-PD），已被释放用于公共使用。在两个现实世界数据集中进行的广泛实验表明了所提出的方法的有效性，该方法在1M-PD数据集上实现了0.40 ％的误报和99.79 ％的99.79 ％真正阳性。特别是，所提出的方法仅在32ms中仅在个人计算机上检测32ms的每个URL，这比现有方法快得多，甚至可以是实时的。

著录项

来源
《IEEE Intl Conf on Ubiquitous Computing amp;amp;amp;amp;amp;amp; Communications》|2018年|596p|共8页
会议地点
作者
Huaping Yuan; Zhenguo Yang; Xu Chen; Yukun Li; Wenyin Liu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Phishing; Uniform resource locators; Feature extraction; Task analysis; Electronic mail; Protocols; Machine learning;

机译：网络钓鱼;统一的资源定位器;特征提取;任务分析;电子邮件;协议;机器学习;

相似文献

外文文献
中文文献
专利

1. CLASSIFICATION MODEL BASED ON URL AND CONTENT FEATURE APPROACH FOR DETECTION PHISHING WEBSITE IN INDONESIA [J] . FEBRY EKA PURWIANTONO, ARIS TJAHYANTO Journal of Theoretical and Applied Information Technology . 2017,第17期

机译：基于URL和内容特征方法的印尼钓鱼网站分类模型
2. CatchPhish: detection of phishing websites by inspecting URLs [J] . Rao Routhu Srinivasa, Vaishnavi Tatti, Pais Alwyn Roshan Journal of ambient intelligence and humanized computing . 2020,第2期

机译：CatchPhish：通过检查URL来检测网络钓鱼网站
3. Websites Phishing Detection Using URLs Tokens as a Discriminating Features [J] . Ammar Yahya Daeef, R. Badlishah Ahmad, Yasmin Yacob Journal of Engineering & Applied Sciences . 2017,第3期

机译：网站网络钓鱼检测使用URL令牌作为鉴别功能
4. URL2Vec: URL Modeling with Character Embeddings for Fast and Accurate Phishing Website Detection [C] . Huaping Yuan, Zhenguo Yang, Xu Chen, 2018 IEEE Intl Conf on Parallel amp; Distributed Processing with Applications, Ubiquitous Computing amp; Communications, Big Data amp; Cloud Computing, Social Computing amp; Networking, Sustainable Computing amp; Communications . 2018

机译：URL2Vec：具有字符嵌入的URL建模，可进行快速准确的网络钓鱼网站检测
5. URL-based Phishing Detection using Entropy of Non-Alphanumeric Characters [D] . Eint Sandi Aung 2019

机译：使用非字母数字字符熵的基于URL的网络钓鱼检测
6. Improving the phishing website detection using empirical analysis of Function Tree and its variants [O] . Abdullateef O. Balogun, Kayode S. Adewole, Muiz O. Raheem, 2021

机译：使用函数树及其变体的实证分析改善网络钓鱼网站检测
7. An Effective Phishing Detection Model Based on Character Level Convolutional Neural Network from URL [O] . Ali Aljofey, Qingshan Jiang, Qiang Qu, 2020

机译：一种基于URL字符级卷积神经网络的有效网络钓鱼检测模型
8. Modeling Content from Human-Verified Blacklists for Accurate Zero-Hour Phish Detection [R] . 2009

机译：从经过人工验证的黑名单中建模内容，以实现准确的零小时网络钓鱼检测

URL2Vec: URL Modeling with Character Embeddings for Fast and Accurate Phishing Website Detection

摘要

著录项

相似文献

相关主题

期刊订阅