首页> 外文会议>International Conference on Intelligent Computer Communication and Processing >Detecting Malicious URLs Based on Machine Learning Algorithms and Word Embeddings
【24h】

Detecting Malicious URLs Based on Machine Learning Algorithms and Word Embeddings

机译:基于机器学习算法和词嵌入的恶意URL检测

获取原文

摘要

Relying on the appropriate features is essential in classification models for malware detection, for various important reasons, such as dealing with class imbalance, the ability to detect zero-day malware samples, or preventing attackers to successfully reverse engineer the classification process and changing nonessential feature values to avoid detection. In this paper, we propose a method that uses a combination of word embeddings together with “classical”, domain-engineered features, to obtain reliable classification models for malicious URLs detection. Additionally, we explore different traditional techniques to address class imbalance – such as synthetic oversampling or cost-sensitive learning – and several classification techniques. We find that the best overall results are obtained by using a cost-sensitive neural network – with a precision that exceeds 99% and an accuracy above 90%, while maintaining a recall rate above 89%. We have performed an analysis of the importance of the features proposed, and found that while word embeddings produce better results than bi-gram based features, domain-specific features are necessary for obtaining a high precision in detecting malicious URLs.
机译:出于各种重要原因,依赖适当的功能对于恶意软件检测的分类模型至关重要,例如,处理类不平衡,检测零日恶意软件样本的能力或阻止攻击者成功地对分类过程进行反向工程和更改不必要的功能值以避免检测。在本文中,我们提出了一种方法,该方法将词嵌入与“经典”的领域工程特征结合使用,以获得用于恶意URL检测的可靠分类模型。此外,我们探索了不同的传统技术来解决班级失衡问题,例如综合过采样或对成本敏感的学习方法,以及几种分类技术。我们发现,使用成本敏感的神经网络可获得最佳的总体结果-精度超过99%,精度超过90%,而召回率保持在89%以上。我们对提出的功能的重要性进行了分析,发现与基于二元语法的功能相比,词嵌入产生的效果更好,但特定领域的功能对于获得检测恶意URL的高精度是必不可少的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号