...
首页> 外文期刊>Journal of intelligent & fuzzy systems: Applications in Engineering and Technology >Knowledge discovery in virtual community texts: Clustering virtual communities
【24h】

Knowledge discovery in virtual community texts: Clustering virtual communities

机译:虚拟社区文本中的知识发现:群集虚拟社区

获取原文
获取原文并翻译 | 示例
           

摘要

Automatic knowledge discovery from texts (KDT) is proving to be a promising method for businesses today to deal with the overload of textual information. In this paper, we first explore the possibilities for KDT to enhance communication in virtual communities, and then we present a practical case study with real-life Internet data. The problem in the case study is to manage the very successful virtual communities known as 'clubs' of the largest Dutch Internet Service Provider. It is possible for anyone to start a club about any subject, resulting in over 10,000 active clubs today. At the beginning, the founder assigns the club to a predefined category. This often results in illogical or inconsistent placements, which means that interesting clubs may be hard to locate for potential new members. The ISP therefore is looking for an automated way to categorize clubs in a logical and consistent manner. The method used is the so-called bag-of-words approach, previously applied mostly to scientific texts and structured documents. Each club is described by a vector of word occurrences of all communications within that club. Latent Semantic Indexing (LSI) is applied to reduce the dimensionality problem prior to clustering. Clustering is done by the Within Groups Clustering method using a cosine distance measure appropriate for texts. The results show that KDT and the LSI method can successfully be applied for clustering the very volatile and unstructured textual communication on the Internet.
机译:事实证明,来自文本的自动知识发现(KDT)是当今企业应对文本信息过多的一种有前途的方法。在本文中,我们首先探讨了KDT在虚拟社区中增强交流的可能性,然后针对实际的Internet数据进行了实际案例研究。案例研究中的问题是管理非常成功的虚拟社区,这些社区被称为荷兰最大的互联网服务提供商的“俱乐部”。任何人都有可能针对任何主题成立俱乐部,从而使今天的活跃俱乐部超过10,000个。最初,创始人将俱乐部分配给预定义的类别。这通常会导致布局不合逻辑或不一致,这意味着有趣的俱乐部可能很难找到潜在的新成员。因此,ISP正在寻找一种自动化的方法,以逻辑上一致的方式对俱乐部进行分类。所使用的方法是所谓的词袋法,以前主要应用于科学文本和结构化文档。每个俱乐部都由该俱乐部内所有通信的单词出现向量来描述。潜在语义索引(LSI)用于减少聚类之前的维数问题。使用适合于文本的余弦距离度量,通过“组内聚类”方法完成聚类。结果表明,KDT和LSI方法可以成功地应用于Internet上非常不稳定和非结构化的文本通信的聚类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号