首页> 外文会议>International Conference of the Cross-Language Evaluation Forum for European Languages >TwitCID: A Collection of Data Sets for Studies on Information Diffusion on Social Networks
【24h】

TwitCID: A Collection of Data Sets for Studies on Information Diffusion on Social Networks

机译:TwitCID:用于社交网络上信息扩散研究的数据集的集合

获取原文

摘要

Online social networks play a crucial role in spreading information at a very large scale. Modeling information propagation on social networks has been attracting a lot of attention from researchers. However, none of the data sets used in past works are made available to the research community, while they would be very useful for comparative studies. In this paper, we detail a collection of tweets composed of five data sets for a total of 18 million tweets that we release, and which is designed to evaluate methods on modeling the information spread, in the case of general information and brands marketing information. In addition to tweet IDs and a script to retrieve the whole tweet in JSON from the Twitter API, we release the values of the 29 extracted features for these data sets. These features consist of user based, content based and temporal based features. Finally, we provide the results of information diffusion prediction models (80% accuracy) which could serve as strong baselines for this research topic.
机译:在线社交网络在大规模传播信息中起着至关重要的作用。对社交网络上的信息传播进行建模已引起研究人员的广泛关注。但是,过去的工作中使用的数据集都没有提供给研究团体,尽管它们对于比较研究非常有用。在本文中,我们详细介绍了一组推文的集合,这些推文由五个数据集组成,总共发布了1800万条推文,用于在一般信息和品牌营销信息的情况下评估建模信息传播的方法。除了推特ID和用于从Twitter API检索JSON中整个推文的脚本之外,我们还为这些数据集发布了29个提取功能的值。这些功能包括基于用户,基于内容和基于时间的功能。最后,我们提供了信息扩散预测模型的结果(准确性为80%),可以作为该研究主题的强基准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号