...
首页> 外文期刊>Complexity >Application of Data Mining Technology on Surveillance Report Data of HIV/AIDS High-Risk Group in Urumqi from 2009 to 2015
【24h】

Application of Data Mining Technology on Surveillance Report Data of HIV/AIDS High-Risk Group in Urumqi from 2009 to 2015

机译:数据挖掘技术在乌鲁木齐市2009年至2015年HIV / AIDS高危人群监测报告数据中的应用

获取原文
           

摘要

Objective. Urumqi is one of the key areas of HIV/AIDS infection in Xinjiang and in China. The AIDS epidemic is spreading from high-risk groups to the general population, and the situation is still very serious. The goal of this study was to use four data mining algorithms to establish the identification model of HIV infection and compare their predictive performance. Method. The data from the sentinel monitoring data of the three groups of high-risk groups (injecting drug users (IDU), men who have sex with men (MSM), and female sex workers (FSW)) in Urumqi from 2009 to 2015 included demographic characteristics, sex behavior, and serological detection results. Then we used age, marital status, education level, and other variables as input variables and whether to infect HIV as output variables to establish four prediction models for the three datasets. We also used confusion matrix, accuracy, sensitivity, specificity, precision, recall, and the area under the receiver operating characteristic (ROC) curve (AUC) to evaluate classification performance and analyzed the importance of predictive variables. Results. The final experimental results show that random forests algorithm obtains the best results, the diagnostic accuracy for random forests on MSM dataset is 94.4821%, 97.5136% on FSW dataset, and 94.6375% on IDU dataset. The k-nearest neighbors algorithm came out second, with 91.5258% diagnostic accuracy on MSM dataset, 96.3083% diagnostic accuracy on FSW dataset, and 90.8287% diagnostic accuracy on IDU dataset, followed by support vector machine (94.0182%, 98.0369%, and 91.3571%). The decision tree algorithm was the poorest among the four algorithms, with 79.1761% diagnostic accuracy on MSM dataset, 87.0283% diagnostic accuracy on FSW dataset, and 74.3879% accuracy on IDU. Conclusions. Data mining technology, as a new method of assisting disease screening and diagnosis, can help medical personnel to screen and diagnose AIDS rapidly from a large number of information.
机译:目的。乌鲁木齐是新疆和中国艾滋病毒/艾滋病感染的重点地区之一。艾滋病的流行正在从高危人群向普通人群蔓延,情况仍然很严重。这项研究的目的是使用四种数据挖掘算法来建立HIV感染的识别模型并比较其预测性能。方法。 2009年至2015年乌鲁木齐的三组高危人群(注射吸毒者(IDU),男男性接触者(MSM)和女性性工作者(FSW))的前哨监测数据包括人口统计数据特征,性行为和血清学检测结果。然后,我们使用年龄,婚姻状况,受教育程度和其他变量作为输入变量,并以是否感染艾滋病毒作为输出变量来为这三个数据集建立四个预测模型。我们还使用混淆矩阵,准确性,敏感性,特异性,精度,召回率以及接收器工作特征(ROC)曲线(AUC)下的面积来评估分类性能并分析了预测变量的重要性。结果。最终的实验结果表明,随机森林算法获得了最好的结果,MSM数据集对随机森林的诊断准确性为94.4821%,FSW数据集为97.5136%,IDU数据集为94.6375%。 k最近邻算法排在第二位,MSM数据集的诊断准确度为91.5258%,FSW数据集的诊断准确度为96.3083%,IDU数据集的诊断准确度为90.8287%,其次是支持向量机(94.0182%,98.0369%和91.3571 %)。决策树算法是这四种算法中最差的,在MSM数据集上的诊断准确性为79.1761%,在FSW数据集上的诊断准确性为87.0283%,在IDU上的准确性为74.3879%。结论。数据挖掘技术作为一种辅助疾病筛查和诊断的新方法,可以帮助医务人员从大量信息中快速筛查和诊断艾滋病。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号