...
首页> 外文期刊>ACM Transactions on Internet Technology >Integrating Social and Auxiliary Semantics for Multifaceted Topic Modeling in Twitter
【24h】

Integrating Social and Auxiliary Semantics for Multifaceted Topic Modeling in Twitter

机译:集成社交和辅助语义以在Twitter中进行多主题建模

获取原文
获取原文并翻译 | 示例
           

摘要

Microblogging platforms, such as Twitter, have already played an important role in recent cultural, social and political events. Discovering latent topics from social streams is therefore important for many downstream applications, such as clustering, classification or recommendation. However, traditional topic models that rely on the bag-of-words assumption are insufficient to uncover the rich semantics and temporal aspects of topics in Twitter. In particular, microblog content is often influenced by external information sources, such as Web documents linked from Twitter posts, and often focuses on specific entities, such as people or organizations. These external sources provide useful semantics to understand microblogs and we generally refer to these semantics as auxiliary semantics. In this article, we address the mentioned issues and propose a unified framework for Multifaceted Topic Modeling from Twitter streams. We first extract social semantics from Twitter by modeling the social chatter associated with hashtags. We further extract terms and named entities from linked Web documents to serve as auxiliary semantics during topic modeling. The Multifaceted Topic Model (MfTM) is then proposed to jointly model latent semantics among the social terms from Twitter, auxiliary terms from the linked Web documents and named entities. Moreover, we capture the temporal characteristics of each topic. An efficient online inference method for MfTM is developed, which enables our model to be applied to large-scale and streaming data. Our experimental evaluation shows the effectiveness and efficiency of our model compared with state-of-the-art baselines. We evaluate each aspect of our framework and show its utility in the context of tweet clustering.
机译:微博平台(例如Twitter)已经在最近的文化,社会和政治事件中发挥了重要作用。因此,从社交流中发现潜在主题对于许多下游应用程序(如聚类,分类或推荐)很重要。但是,依靠词袋假设的传统主题模型不足以揭示Twitter中主题的丰富语义和时间方面。特别是,微博内容通常受外部信息源的影响,例如从Twitter帖子链接的Web文档,并且通常侧重于特定的实体,例如人员或组织。这些外部资源为理解微博提供了有用的语义,我们通常将这些语义称为辅助语义。在本文中,我们将解决上述问题,并为Twitter流中的多面主题建模提出一个统一的框架。我们首先通过对与主题标签关联的社交聊天进行建模来从Twitter提取社交语义。我们进一步从链接的Web文档中提取术语和命名实体,以在主题建模期间充当辅助语义。然后提出了多方面主题模型(MfTM),以共同对来自Twitter的社交术语,来自链接的Web文档的辅助术语和命名实体之间的潜在语义进行建模。此外,我们捕获每个主题的时间特征。开发了一种有效的MfTM在线推断方法,该方法使我们的模型能够应用于大规模和流数据。我们的实验评估表明,与最新的基准相比,我们模型的有效性和效率。我们评估框架的各个方面,并在推特群集的背景下展示其效用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号