...
首页> 外文期刊>JMIR Mental Health >Identifying Chinese Microblog Users With High Suicide Probability Using Internet-Based Profile and Linguistic Features: Classification Model
【24h】

Identifying Chinese Microblog Users With High Suicide Probability Using Internet-Based Profile and Linguistic Features: Classification Model

机译:使用基于Internet的配置文件和语言特征识别具有高自杀可能性的中国微博用户:分类模型

获取原文
           

摘要

Background Traditional offline assessment of suicide probability is time consuming and difficult in convincing at-risk individuals to participate. Identifying individuals with high suicide probability through online social media has an advantage in its efficiency and potential to reach out to hidden individuals, yet little research has been focused on this specific field. Objective The objective of this study was to apply two classification models, Simple Logistic Regression (SLR) and Random Forest (RF), to examine the feasibility and effectiveness of identifying high suicide possibility microblog users in China through profile and linguistic features extracted from Internet-based data. Methods There were nine hundred and nine Chinese microblog users that completed an Internet survey, and those scoring one SD above the mean of the total Suicide Probability Scale (SPS) score, as well as one SD above the mean in each of the four subscale scores in the participant sample were labeled as high-risk individuals, respectively. Profile and linguistic features were fed into two machine learning algorithms (SLR and RF) to train the model that aims to identify high-risk individuals in general suicide probability and in its four dimensions. Models were trained and then tested by 5-fold cross validation; in which both training set and test set were generated under the stratified random sampling rule from the whole sample. There were three classic performance metrics (Precision, Recall, F1 measure) and a specifically defined metric “Screening Efficiency” that were adopted to evaluate model effectiveness. Results Classification performance was generally matched between SLR and RF. Given the best performance of the classification models, we were able to retrieve over 70% of the labeled high-risk individuals in overall suicide probability as well as in the four dimensions. Screening Efficiency of most models varied from 1/4 to 1/2. Precision of the models was generally below 30%. Conclusions Individuals in China with high suicide probability are recognizable by profile and text-based information from microblogs. Although there is still much space to improve the performance of classification models in the future, this study may shed light on preliminary screening of risky individuals via machine learning algorithms, which can work side-by-side with expert scrutiny to increase efficiency in large-scale-surveillance of suicide probability from online social media.
机译:背景技术传统的自杀可能性离线评估既费时又难以说服处于危险中的个人参与。通过在线社交媒体识别自杀可能性高的人,在其效率和接触隐藏的人方面具有优势,但针对这一特定领域的研究很少。目的本研究的目的是应用简单分类逻辑回归(SLR)和随机森林(RF)这两种分类模型,以检验通过从互联网提取的个人资料和语言特征来识别中国高自杀可能性微博用户的可行性和有效性。基础数据。方法共有909名中国微博用户完成了一项互联网调查,其得分均比自杀概率量表(SPS)均值高1个SD,而在四个子量表中均得分均高出SD。参与者样本中的分别标记为高风险个体。轮廓和语言特征被输入到两种机器学习算法(SLR和RF)中,以训练该模型,该模型旨在识别一般自杀概率及其四个维度中的高风险个体。训练模型,然后通过5倍交叉验证进行测试;其中训练集和测试集都是根据分层随机抽样规则从整个样本中生成的。有三个经典的性能指标(精度,召回率,F1指标)和专门定义的指标“筛选效率”被用来评估模型的有效性。结果分类性能通常在SLR和RF之间匹配。考虑到分类模型的最佳性能,我们能够在总体自杀概率以及四个维度上检索出超过70%的标记高危个体。大多数模型的筛选效率从1/4到1/2不等。模型的精度通常低于30%。结论通过微博中的个人资料和基于文本的信息,可以识别出中国自杀可能性较高的个人。尽管将来仍有很大的空间可以提高分类模型的性能,但这项研究可能会为通过机器学习算法对危险个体进行初步筛选提供启示,该算法可以与专家审查并肩工作,以提高大型模型的效率。在线社交媒体对自杀概率的规模监测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号