Identifying Chinese Microblog Users With High Suicide Probability Using Internet-Based Profile and Linguistic Features: Classification Model

Li Guan  MS; Bibo Hao  MS; Qijin Cheng  PhD; Paul SF Yip  PhD; Tingshao Zhu  PhD

首页> 外文期刊>JMIR Mental Health >Identifying Chinese Microblog Users With High Suicide Probability Using Internet-Based Profile and Linguistic Features: Classification Model

【24h】

Identifying Chinese Microblog Users With High Suicide Probability Using Internet-Based Profile and Linguistic Features: Classification Model

机译：使用基于Internet的配置文件和语言特征识别具有高自杀可能性的中国微博用户：分类模型

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background Traditional offline assessment of suicide probability is time consuming and difficult in convincing at-risk individuals to participate. Identifying individuals with high suicide probability through online social media has an advantage in its efficiency and potential to reach out to hidden individuals, yet little research has been focused on this specific field. Objective The objective of this study was to apply two classification models, Simple Logistic Regression (SLR) and Random Forest (RF), to examine the feasibility and effectiveness of identifying high suicide possibility microblog users in China through profile and linguistic features extracted from Internet-based data. Methods There were nine hundred and nine Chinese microblog users that completed an Internet survey, and those scoring one SD above the mean of the total Suicide Probability Scale (SPS) score, as well as one SD above the mean in each of the four subscale scores in the participant sample were labeled as high-risk individuals, respectively. Profile and linguistic features were fed into two machine learning algorithms (SLR and RF) to train the model that aims to identify high-risk individuals in general suicide probability and in its four dimensions. Models were trained and then tested by 5-fold cross validation; in which both training set and test set were generated under the stratified random sampling rule from the whole sample. There were three classic performance metrics (Precision, Recall, F1 measure) and a specifically defined metric “Screening Efficiency” that were adopted to evaluate model effectiveness. Results Classification performance was generally matched between SLR and RF. Given the best performance of the classification models, we were able to retrieve over 70% of the labeled high-risk individuals in overall suicide probability as well as in the four dimensions. Screening Efficiency of most models varied from 1/4 to 1/2. Precision of the models was generally below 30%. Conclusions Individuals in China with high suicide probability are recognizable by profile and text-based information from microblogs. Although there is still much space to improve the performance of classification models in the future, this study may shed light on preliminary screening of risky individuals via machine learning algorithms, which can work side-by-side with expert scrutiny to increase efficiency in large-scale-surveillance of suicide probability from online social media.

机译：背景技术传统的自杀可能性离线评估既费时又难以说服处于危险中的个人参与。通过在线社交媒体识别自杀可能性高的人，在其效率和接触隐藏的人方面具有优势，但针对这一特定领域的研究很少。目的本研究的目的是应用简单分类逻辑回归（SLR）和随机森林（RF）这两种分类模型，以检验通过从互联网提取的个人资料和语言特征来识别中国高自杀可能性微博用户的可行性和有效性。基础数据。方法共有909名中国微博用户完成了一项互联网调查，其得分均比自杀概率量表（SPS）均值高1个SD，而在四个子量表中均得分均高出SD。参与者样本中的分别标记为高风险个体。轮廓和语言特征被输入到两种机器学习算法（SLR和RF）中，以训练该模型，该模型旨在识别一般自杀概率及其四个维度中的高风险个体。训练模型，然后通过5倍交叉验证进行测试;其中训练集和测试集都是根据分层随机抽样规则从整个样本中生成的。有三个经典的性能指标（精度，召回率，F1指标）和专门定义的指标“筛选效率”被用来评估模型的有效性。结果分类性能通常在SLR和RF之间匹配。考虑到分类模型的最佳性能，我们能够在总体自杀概率以及四个维度上检索出超过70％的标记高危个体。大多数模型的筛选效率从1/4到1/2不等。模型的精度通常低于30％。结论通过微博中的个人资料和基于文本的信息，可以识别出中国自杀可能性较高的个人。尽管将来仍有很大的空间可以提高分类模型的性能，但这项研究可能会为通过机器学习算法对危险个体进行初步筛选提供启示，该算法可以与专家审查并肩工作，以提高大型模型的效率。在线社交媒体对自杀概率的规模监测。

著录项

来源
《JMIR Mental Health》 |2015年第2期|共1页
作者
Li Guan MS; Bibo Hao MS; Qijin Cheng PhD; Paul SF Yip PhD; Tingshao Zhu PhD;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类医药、卫生;
关键词

相似文献

外文文献
中文文献
专利

1. Suicide Communication on Social Media and Its Psychological Mechanisms: An Examination of Chinese Microblog Users [J] . Qijin Cheng, Chi Leung Kwok, Tingshao Zhu, International Journal of Environmental Research and Public Health . 2015,第9期

机译：社交媒体上的自杀交流及其心理机制：对中国微博用户的考察
2. Suicide Communication on Social Media and Its Psychological Mechanisms: An Examination of Chinese Microblog Users [J] . Chi Leung Kwok, Li Guan, Paul B. Tchounwou, International Journal of Environmental Research and Public Health . 2015,第9期

机译：社交媒体上的自杀交流及其心理机制：对中国微博用户的考察
3. Identify User Profiles in Information Systems with Unknown Users - A Database Modelling Approach [J] . Lars-Erik Axelsson International Journal of Public Information Systems . 2006,第2期

机译：在具有未知用户的信息系统中识别用户配置文件-一种数据库建模方法
4. Using Linguistic Features to Estimate Suicide Probability of Chinese Microblog Users [C] . Lei Zhang, Xiaolei Huang, Tianli Liu, International conference on human centered computing . 2015

机译：使用语言特征估计中国微博用户的自杀概率
5. Chinese Microblog Users' Efforts to Solve Social Problems: A Study of Online Participatory Behaviors and Their Implications. [D] . Liu, Miao. 2013

机译：中国微博用户解决社会问题的努力：在线参与行为及其意义研究。
6. Suicide Communication on Social Media and Its Psychological Mechanisms: An Examination of Chinese Microblog Users [O] . Qijin Cheng, Chi Leung Kwok, Tingshao Zhu, 2015

机译：社交媒体上的自杀交流及其心理机制：对中国微博用户的考察
7. Identifying Chinese Microblog Users With High Suicide Probability Using Internet-Based Profile and Linguistic Features: Classification Model [O] . Cheng, Q, Guan, L, Hao, B, 2015

机译：利用基于Internet的配置文件和语言特征识别具有高自杀概率的中国微博用户：分类模型

Identifying Chinese Microblog Users With High Suicide Probability Using Internet-Based Profile and Linguistic Features: Classification Model

摘要

著录项

相似文献

相关主题

期刊订阅