Detecting Malicious URLs Based on Machine Learning Algorithms and Word Embeddings

机译：基于机器学习算法和词嵌入的恶意URL检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Relying on the appropriate features is essential in classification models for malware detection, for various important reasons, such as dealing with class imbalance, the ability to detect zero-day malware samples, or preventing attackers to successfully reverse engineer the classification process and changing nonessential feature values to avoid detection. In this paper, we propose a method that uses a combination of word embeddings together with “classical”, domain-engineered features, to obtain reliable classification models for malicious URLs detection. Additionally, we explore different traditional techniques to address class imbalance – such as synthetic oversampling or cost-sensitive learning – and several classification techniques. We find that the best overall results are obtained by using a cost-sensitive neural network – with a precision that exceeds 99% and an accuracy above 90%, while maintaining a recall rate above 89%. We have performed an analysis of the importance of the features proposed, and found that while word embeddings produce better results than bi-gram based features, domain-specific features are necessary for obtaining a high precision in detecting malicious URLs.

机译：出于各种重要原因，依赖适当的功能对于恶意软件检测的分类模型至关重要，例如，处理类不平衡，检测零日恶意软件样本的能力或阻止攻击者成功地对分类过程进行反向工程和更改不必要的功能值以避免检测。在本文中，我们提出了一种方法，该方法将词嵌入与“经典”的领域工程特征结合使用，以获得用于恶意URL检测的可靠分类模型。此外，我们探索了不同的传统技术来解决班级失衡问题，例如综合过采样或对成本敏感的学习方法，以及几种分类技术。我们发现，使用成本敏感的神经网络可获得最佳的总体结果-精度超过99％，精度超过90％，而召回率保持在89％以上。我们对提出的功能的重要性进行了分析，发现与基于二元语法的功能相比，词嵌入产生的效果更好，但特定领域的功能对于获得检测恶意URL的高精度是必不可少的。

著录项

来源
《International Conference on Intelligent Computer Communication and Processing》|2020年|187-193|共7页
会议地点
作者
Andrei Crişan; Gabriel Florea; Lorand Halasz; Camelia Lemnaru; Ciprian Oprisa;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Uniform resource locators; Feature extraction; Blacklisting; Security; Malware; Navigation; Machine learning algorithms;

机译：统一资源定位器;特征提取;黑名单;安全性;恶意软件;导航;机器学习算法;

相似文献

外文文献
中文文献
专利

1. URLdeepDetect: A Deep Learning Approach for Detecting Malicious URLs Using Semantic Vector Models [J] . Sara Afzal, Muhammad Asim, Abdul Rehman Javed, Journal of network and systems management . 2021,第3期

机译：UrldeepDetect：使用语义矢量模型来检测恶意URL的深度学习方法
2. Detecting malicious URLs using binary classification through adaboost algorithm [J] . Firoz Khan, Jinesh Ahamed, Seifedine Kadry, International Journal of Electrical and Computer Engineering . 2020,第1期

机译：通过Adaboost算法使用二进制分类来检测恶意URL
3. Malicious URL detection with feature extraction based on machine learning [J] . Baojiang Cui, Shanshan He, Xi Yao, International Journal of High Performance Computing and Networking . 2018,第2期

机译：基于机器学习的特征提取的恶意URL检测
4. Detecting Malware, Malicious URLs and Virus Using Machine Learning and Signature Matching [C] . Jatin Acharya, Anshul Chuadhary, Anish Chhabria, International Conference for Emerging Technology . 2021

机译：使用机器学习和签名匹配检测恶意软件，恶意URL和病毒
5. Learning to detect malicious URLs. [D] . Ma, Justin Tung. 2010

机译：学习检测恶意URL。
6. ISOMAP and machine learning algorithms for the construction of embedded functional connectivity networks of anatomically separated brain regions from resting state fMRI data of patients with Schizophrenia [O] . Ioannis K Gallos, Kostakis Gkiatis, George K Matsopoulos, 2021

机译：ISOMAP和机器学习算法用于构建解剖学分离的脑区的嵌入式功能连接网络免受精神分裂症患者休息状态FMRI数据的嵌入式脑区
7. Detecting Malicious URLs via a Keyword-Based Convolutional Gated-Recurrent-Unit Neural Network [O] . Wenchuan Yang, Wen Zuo, Baojiang Cui 2019

机译：通过基于关键字的卷积门控 - 重复单元的神经网络来检测恶意URL

Detecting Malicious URLs Based on Machine Learning Algorithms and Word Embeddings

摘要

著录项

相似文献

相关主题

期刊订阅