首页> 外文会议>International Conference of the Cross-Language Evaluation Forum for European Languages >TwitCID: A Collection of Data Sets for Studies on Information Diffusion on Social Networks
【24h】

TwitCID: A Collection of Data Sets for Studies on Information Diffusion on Social Networks

机译:Twitcid:用于社交网络信息扩散研究的数据集集合

获取原文

摘要

Online social networks play a crucial role in spreading information at a very large scale. Modeling information propagation on social networks has been attracting a lot of attention from researchers. However, none of the data sets used in past works are made available to the research community, while they would be very useful for comparative studies. In this paper, we detail a collection of tweets composed of five data sets for a total of 18 million tweets that we release, and which is designed to evaluate methods on modeling the information spread, in the case of general information and brands marketing information. In addition to tweet IDs and a script to retrieve the whole tweet in JSON from the Twitter API, we release the values of the 29 extracted features for these data sets. These features consist of user based, content based and temporal based features. Finally, we provide the results of information diffusion prediction models (80% accuracy) which could serve as strong baselines for this research topic.
机译:在线社交网络在以非常大的规模传播信息中发挥着至关重要的作用。建模信息在社交网络上的传播一直吸引了研究人员的很多关注。然而,过去工程中使用的任何数据集都没有提供给研究界,同时对比较研究非常有用。在本文中,我们详细介绍了一系列由五个数据集组成的推文,总计1800万推文,我们释放了1800万推文,旨在评估在一般信息和品牌营销信息的情况下对展开信息进行建模的方法。除了推文ID和脚本从Twitter API中检索整个推文,我们将释放这些数据集的29提取功能的值。这些功能包括基于用户的基于内容和基于时间的特征。最后,我们提供信息扩散预测模型(80%精度)的结果,这可以作为该研究主题的强大基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号