Machine Learning based Dataset for Finding Suicidal Ideation on Twitter

机译：基于机器学习的数据集用于在Twitter上查找自杀意图

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Suicidal ideation is a major health issue nowadays. This may lead to death of various people. Suicide is also one of the major reason of death in many of the countries [9] [14]. Automatically finding people having suicidal ideation on social media is a major concern and a lot of people are working in this direction [10] [11]. There are many risk factor associated with suicidal ideation such as anxiety, depression, mental disorder etc. [13] [15]. A number of methods have been made to prevent deaths because of suicide. With the advent of social networking site, people have started expressing their feelings more on social media rather than someone in personal [6] [12]. Text classification has proven to be a successful method to prevent suicides [8]. This article describes a dataset of people having suicidal ideation on twitter. The data was extracted from an Application Programming Interface provided by Twitter. Various features/keywords related to suicidal ideation shown in table 2 were used to identify persons having such ideation. These keywords have been gathered from various web forums and previous year papers [7]. Initially the dataset have been taken from Twitter public application programming interface using its access key and access token. The raw data comprises of various fields such as: user__id, user__name, created_at, text, user__screen_name, user__friends_count, user__listed_count, user__favourites_count, user__followers_count, user__statuses_count, user__created_at, user_location with around 14202 tweets a part of which is shown in table 3. After that a sample of 1897 tweets were extracted depending upon the keywords selected and merely the text and class fields are set aside as needed to be given as input to any of the algorithms as shown in table 4. The class consists of binary values having either value 0 (non-suicidal) or 1 (suicidal) based on whether the tweet is related to suicidal ideation or not. This is done by a manual annotation by a human annotator and a psychiatric expert as shown in table 4. In the final step the preprocessing of the tweets are done based on the semantics of the keywords recognized and then based on the text fileld new colums are added to the table which contains all the keywords and the table is altered into the probabilistic values i.e. either 0 or 1. Based on the occurrence/non-occurrence of the keyword, a value 0 or 1 is assigned to each keyword and tweet in the particular record. We have given a value 1 if the specific keyword exists in that particular tweet and we have given a value 0 if a keyword doesn’t exist in the particular tweet and hence the resultant dataset consists of only binary (0 or 1) values as given in table 5 [1]. The resultant dataset consists of 1897 tweets and 34 features. A number of machine learning algorithms like Multinomial Naïve Bayes, Bernoulli Naïve Bayes, Logistic Regression, Decision Tree, Support Vector Machine, Random Forest, Voting Ensemble and AdaBoost are then used on this dataset for testing the dataset and finding the accuracy, recall and precision.

机译：自杀意识形动是如今的主要健康问题。这可能导致各种人死亡。自杀也是许多国家死亡的主要原因之一[9] [14]。自动寻找对社交媒体的自杀意念的人是一个主要问题，很多人在这个方向上工作[10] [11]。有许多危险因素与自杀素相似，如焦虑，抑郁，精神障碍等[13] [15]。已经进行了许多方法以防止死亡因自杀。随着社交网站的出现，人们已经开始在社交媒体上表达自己的感受，而不是个人[6] [12]。文本分类已被证明是防止自杀的成功方法[8]。本文介绍了在Twitter上具有自杀意图的人的数据集。从Twitter提供的应用程序编程接口中提取数据。与表2中所示的自杀式大象相关的各种特征/关键词用于识别具有此类观点的人。这些关键字已收集来自各种网络论坛和上一年的论文[7]。最初，数据集已从Twitter公共应用程序编程接口中获取，使用其访问密钥和访问令牌。原始数据包括各种字段，例如：user__id，user__name，created_at，text，user_count，user__listed_count，user__favourites_count，user__followers_count user__followers_count，user__statuses_count，user__created_at，user_location ysual_location，user_location，user_location，其中包含在表3中所示的一部分。之后根据所选择的关键字提取1897个推文的样本，并且仅根据需要将文本和类字段置于任何算法中的文本和类字段，如表4所示。该类由具有值0的二进制值组成（基于推文是否与自杀意图有关的非自杀）或1（自杀）。这是由人类注释器和精神病专家的手动注释来完成的，如表4所示。在最后一步中，推文的预处理是根据识别的关键字的语义来完成的，然后基于文本菲尔德新核素添加到包含所有关键字的表中，表格被更改为0或1.基于关键字的发生/不发生，将值0或1分配给每个关键字和推文具体记录。如果特定关键字存在于该特定推文中，我们已经给出了值1，并且如果在特定推文中不存在关键字，则给出了值0，因此结果数据集仅由给定的二进制（0或1）值组成在表5 [1]中。结果数据集由1897个推文和34个功能组成。然后，在此数据集上使用伯努利天真贝叶斯，伯努利天真贝叶斯，伯努利天真贝叶斯，伯努利天真贝叶斯，博尔努利天真湾，支持向量机，随机森林，投票集合和adaboost，用于测试数据集并找到准确性，召回和精度。。

著录项

来源
《International Conference on Intelligent Communication Technologies and Virtual Mobile Networks》|2021年|823-828|共6页
会议地点
作者
Akshma Chadha; Baijnath Kaushik;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Support vector machines; Social networking (online); Text recognition; Blogs; Text categorization; Regression tree analysis; Testing;

机译：支持矢量机;社交网络（在线）;文本识别;博客;文本分类;回归树分析;测试;

相似文献

外文文献
中文文献
专利

1. Suicidal Ideation Detection: A Review of Machine Learning Methods and Applications [J] . Ji Shaoxiong, Pan Shirui, Li Xue, Computational Social Systems, IEEE Transactions on . 2021,第1期

机译：自杀式大象检测：对机器学习方法和应用的综述
2. A machine learning approach predicts future risk to suicidal ideation from social media data [J] . Arunima Roy, Katerina Nikolitch, Rachel McGinn, npj Digital Medicine . 2020,第1期

机译：机器学习方法预测社交媒体数据的自杀意外未来风险
3. Prediction of Suicidal Ideation among Korean Adults Using Machine Learning: A Cross-Sectional Study [J] . Bumjo Oh, Je-Yeon Yun, Eun Chong Yeo, Psychiatry Investigation . 2020,第4期

机译：用机器学习预测韩国成人的自杀意识形来：横截面研究
4. A Machine Learning based Depression Analysis and Suicidal Ideation Detection System using Questionnaires and Twitter [C] . Swati Jain, Suraj Prakash Narayan, Rupesh Kumar Dewang, IEEE Students Conference on Engineering and Systems . 2019

机译：使用问卷和Twitter的基于机器学习的抑郁分析和自杀意念检测系统
5. Active learning with support vector machines for imbalanced datasets and a method for stopping active learning based on stabilizing predictions. [D] . Bloodgood, Michael. 2009

机译：支持向量机用于不平衡数据集的主动学习，以及一种基于稳定预测的主动学习停止方法。
6. Factors associated with suicidal ideation and suicidal attempts among adolescent students in Nepal: Findings from Global School-based Students Health Survey [O] . Achyut Raj Pandey, Bihungum Bista, Raja Ram Dhungana, 2012

机译：尼泊尔青少年学生中有自杀念头和自杀企图的相关因素：全球基于学校的学生健康调查的结果
7. Detection of Suicidal Ideation on Twitter using Machine Learning Ensemble Approaches [O] . Syed Tanzeel Rabani, Qamar Rayees Khan, Akib Mohi UD Din Khanday 2020

机译：使用机器学习和集合方法检测Twitter上的自杀意图

Machine Learning based Dataset for Finding Suicidal Ideation on Twitter

摘要

著录项

相似文献

相关主题

期刊订阅