首页> 外文期刊>International journal on Semantic Web and information systems >Internet Data Analysis Methodology for Cyberterrorism Vocabulary Detection, Combining Techniques of Big Data Analytics, NLP and Semantic Web
【24h】

Internet Data Analysis Methodology for Cyberterrorism Vocabulary Detection, Combining Techniques of Big Data Analytics, NLP and Semantic Web

机译:网络万维网词汇检测互联网数据分析方法,大数据分析,NLP和语义网的组合技术

获取原文
获取原文并翻译 | 示例
           

摘要

This article presents a methodology for the analysis of data on the Internet, combining techniques of Big Data analytics, NLP and semantic web in order to find knowledge about large amounts of information on the web. To test the effectiveness of the proposed method, webpages about cyberterrorism were analyzed as a case study. The procedure implemented a genetic strategy in parallel, which integrates (Crawler to locate and download information from the web; to retrieve the vocabulary, using techniques of NLP (tokenization, stop word, TF, TFIDF), methods of stemming and synonyms). For the pursuit of knowledge was built a dataset through the description of a linguistic corpus with semantic ontologies, considering the characteristics of cyber-terrorism, which was analyzed with the algorithms, Random Forests (parallel), Boosting, SVM, neural network, K-nn and Bayes. The results reveal a percentage of the 95.62% accuracy in the detection of the vocabulary of cyber-terrorism, which were approved through cross validation, reaching 576% time savings with parallel processing.
机译:本文介绍了对互联网上数据的方法,结合大数据分析,NLP和语义网的技术,以便了解有关Web上大量信息的知识。为了测试所提出的方法的有效性,分析了关于以网际线的网页被分析为案例研究。该过程并行实施了遗传策略,该遗传策略集成(爬网程序来定位和下载来自Web的信息;检索词汇,使用NLP的技术(标记化,停止字,TF,TFIDF),Stemming和同义词的方法。考虑到网络恐怖主义的特征,通过描述具有语义性本体的语言语料库的语言语料库的描述,为网络恐怖主义的特征进行了建立了一个数据集。随机森林(并行),升压,SVM,神经网络,K- nn和bayes。结果揭示了检测网络恐怖主义词汇量的95.62%的百分比,该股指通过交叉验证批准,达到了平行处理的576%的节省时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号