Mitigating the Impact of Data Sampling on Social Media Analysis and Mining

Kuai Xu; Feng Wang; Haiyan Wang; Yufang Wang; Ying Zhang

首页> 外文期刊>Computational Social Systems, IEEE Transactions on >Mitigating the Impact of Data Sampling on Social Media Analysis and Mining

【24h】

Mitigating the Impact of Data Sampling on Social Media Analysis and Mining

机译：缓解数据抽样对社交媒体分析和采矿的影响

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The last decade has witnessed the explosive growth of online social media in users and contents. Due to the unprecedented scale and the cascading power of the underlying social networks, social media has created a new paradigm for sharing information, broadcasting breaking news, and reporting real-time events by any user from anywhere at any time. Many popular social media sites including Twitter provide streaming data services by standard APIs to the broad researcher and developer communities. Given the sheer data volume, rapid velocity, and feature variety of online social media, these sites often supply only a sampled set of streaming data, rather than the full data set to reduce the resource cost of computations, storage, and network bandwidth. In light of the substantial impact of sampling in Twitter data stream, this article explores a combination of spectral clustering, locality-sensitive hashing (LSH), latent Dirichlet allocation (LDA) topic modeling, and differential equation modeling to mitigate the impact of sampling on social media data analysis, in particular on detecting real-world events and predicting information diffusion. Our extensive experiments demonstrate that our proposed method is able to detect effectively the real-time emerging events and predict accurately the cascading pattern of these events from the 1% sampled Twitter data stream. To the best of our knowledge, this article is the first effort to introduce a systematic methodology to study and mitigate the impact of data sampling on social media analysis and mining.

机译：过去十年目睹了用户和内容在线社交媒体的爆炸性增长。由于潜在的社交网络的前所未有的规模和级联力量，社交媒体已经为共享信息，广播突发新闻和任何用户的任何时间报告了实时事件的新范式。许多流行的社交媒体网站，包括Twitter，通过标准API向广泛的研究员和开发人员社区提供流式数据服务。鉴于纯粹的数据量，快速速度和特征在线社交媒体，这些网站通常仅提供采样的流数据集，而不是完整的数据集，以降低计算，存储和网络带宽的资源成本。鉴于采样在Twitter数据流中的实质性影响，本文探讨了频谱聚类，位置敏感散列（LSH），潜在的Dirichlet分配（LDA）主题建模和微分方程模型的组合，以减轻采样的影响社交媒体数据分析，特别是检测真实世界事件和预测信息扩散。我们广泛的实验表明，我们的建议方法能够有效地检测实时新兴事件并从1％采样的Twitter数据流中准确地预测这些事件的级联模式。据我们所知，本文是第一次努力引入系统方法学习和减轻数据采样对社交媒体分析和采矿的影响。

著录项

来源
《Computational Social Systems, IEEE Transactions on》 |2020年第2期|546-555|共10页
作者
Kuai Xu; Feng Wang; Haiyan Wang; Yufang Wang; Ying Zhang;
展开▼
作者单位

School of Mathematical and Natural Sciences Arizona State University Glendale AZ USA;

School of Mathematical and Natural Sciences Arizona State University Glendale AZ USA;

School of Mathematical and Natural Sciences Arizona State University Glenda;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Twitter; Real-time systems; Earthquakes; Clustering algorithms; Data mining; Analytical models;

机译：Twitter;实时系统;地震;聚类算法;数据挖掘;分析模型;

相似文献

外文文献
中文文献
专利

1. Social Media Big Data Mining and Spatio-Temporal Analysis on Public Emotions for Disaster Mitigation [J] . Tengfei Yang, Jibo Xie, Guoqing Li, ISPRS International Journal of Geo-Information . 2019,第1期

机译：社交媒体大数据挖掘与公众情绪减灾时空分析
2. A Proposed Model for Consumer-Based Brand Equity Analysis on Social Media Using Data Mining and Social Network Analysis [J] . Eduardo Nogueira, Denise F. Tsunoda Journal Relationship Marketing . 2018,第2期

机译：基于数据挖掘和社交网络分析的基于消费者的社交媒体品牌资产分析模型
3. Evidence of the impacts of metal mining and the effectiveness of mining mitigation measures on social–ecological systems in Arctic and boreal regions: a systematic map protocol [J] . Neal R. Haddaway, Steven J. Cooke, Pamela Lesser, Environmental Evidence . 2019,第1期

机译：在北极和北方地区，金属开采的影响和缓解采矿措施对社会生态系统有效性的证据：系统的地图方案
4. The Impact of Sampling on Big Data Analysis of Social Media: A Case Study on Flu and Ebola [C] . Kuai Xu, Feng Wang, Xiaohua Jia, IEEE Global Communications Conference . 2015

机译：抽样对社会媒体大数据分析的影响 - 以流感和埃博拉病例为例
5. Social user mining: User profiling of social media network based on multimedia data mining. [D] . Eltaher, Mohammed Ali. 2015

机译：社交用户挖掘：基于多媒体数据挖掘的社交媒体网络的用户配置文件。
6. Redundancy in electronic health record corpora: analysis impact on text mining performance and mitigation strategies [O] . Raphael Cohen, Michael Elhadad, Noémie Elhadad 2013

机译：电子病历语料库中的冗余：分析对文本挖掘性能的影响和缓解策略
7. Mapping the predicted and potential impacts of metal mining and its mitigation measures in Arctic and boreal regions using environmental and social impact assessments: a systematic map protocol [O] . Biljana Macura, Neal R. Haddaway, Pamela Lesser, 2019

机译：使用环境和社会影响评估来绘制金属挖掘的预测和潜在影响及其在北极和北极地区的缓解措施：系统地图协议

Mitigating the Impact of Data Sampling on Social Media Analysis and Mining

摘要

著录项

相似文献

相关主题

期刊订阅