Exploiting Tag and Word Correlations for Improved Webpage Clustering

机译：利用标签和单词相关性改进网页聚类

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic clustering of webpages helps a number of information retrieval tasks, such as improving user interfaces, collection clustering, introducing diversity in search results, etc. Typically, webpage clustering algorithms only use features extracted from the page-text. However, the advent of social-bookmarking websites, such as StumbleUpon and Delicious, has led to a huge amount of user-generated content such as the tag information that is associated with the webpages. In this paper, we present a subspace based feature extraction approach which leverages tag information to complement the page-contents of a webpage to extract highly discriminative features, with the goal of improved clustering performance. In our approach, we consider page-text and tags as two separate views of the data, and learn a shared subspace that maximizes the correlation between the two views. Any clustering algorithm can then be applied in this subspace. We compare our subspace based approach with a number of baselines that use tag information in various other ways, and show that the subspace based approach leads to improved performance on the webpage clustering task. Although our results here are on the webpage clustering task, the same approach can be used for webpage classification as well. In the end, we also suggest possible future work for leveraging tag information in webpage clustering, especially when tag information is present for not all, but only for a small number of webpages.

机译：网页的自动聚类有助于许多信息检索任务，例如改善用户界面，馆藏聚类，在搜索结果中引入多样性等。通常，网页聚类算法仅使用从页面文本中提取的功能。但是，诸如StumbleUpon和Delicious这样的社交书签网站的出现导致了大量用户生成的内容，例如与网页相关联的标签信息。在本文中，我们提出了一种基于子空间的特征提取方法，该方法利用标签信息来补充网页的页面内容，以提取具有高度区分性的特征，从而提高聚类性能。在我们的方法中，我们将页面文本和标签视为数据的两个单独的视图，并学习一个共享的子空间，该共享空间可以最大化两个视图之间的相关性。然后，可以在此子空间中应用任何聚类算法。我们将基于子空间的方法与以各种其他方式使用标签信息的基线进行了比较，并表明基于子空间的方法可提高网页聚类任务的性能。尽管我们的结果是关于网页聚类的，但同样的方法也可以用于网页分类。最后，我们还建议在网页聚类中利用标签信息的未来可能的工作，尤其是当不是针对所有网页，而是针对少数网页显示标签信息时。

著录项

来源
《》|2010年|p.3-11|共9页
会议地点 Toronto(CA);Toronto(CA);Toronto(CA);Toronto(CA)
作者
Anusua Trivedi; Piyush Rai; Scott L. DuVall; Hal Daume III;
展开▼
作者单位

School of Computing University of Utah Salt Lake City, Utah, USA;

School of Computing University of Utah Salt Lake City, Utah, USA;

VA SLC Healthcare System University of Utah Salt Lake City, Utah, USA;

Dept. of Computer Science Universty of Maryland College Park, Maryland, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
social tagging; webpage clustering;

机译：社会标签；网页聚类;

相似文献

外文文献
中文文献
专利

1. Semantic tag recommendation based on associated words exploiting the interwiki links of Wikipedia [J] . Hyun-Ki Hong, Gun-Woo Kim, Dong-Ho Lee Journal of Information Science . 2018,第3期

机译：基于关联词的语义标签推荐，利用维基百科的跨维基链接
2. Multiple Hypergraph Clustering of Web Images by Mining Word2Image Correlations [J] . 吴飞, 韩亚洪, 庄越挺计算机科学技术学报：英文版 . 2010,第004期

机译：通过挖掘Word2Image相关性对Web图像进行多重超图聚类
3. Multiple Hypergraph Clustering of Web Images by Mining Word2Image Correlations [J] . Fei Wu, Ya-Hong Han, Yue-Ting Zhuang 计算机科学技术学报（英文版） . 2010,第004期
4. Exploiting Tag and Word Correlations for Improved Webpage Clustering [C] . Anusua Trivedi, Piyush Rai, Scott L. DuVall, International workshop on search and mining user-generated contents . 2010

机译：用于改进的网页聚类的标签和单词相关性
5. Improving Web retrieval by mining the HTML tags for keywords and exploring the hyperlink structures of Web pages. [D] . Quevedo-Torrero, Jesus Ubaldo. 2004

机译：通过挖掘HTML标记的关键字并探索网页的超链接结构来改善Web检索。
6. Improving prediction of burial state of residues by exploiting correlation among residues [O] . Hai’e Gong, Haicang Zhang, Jianwei Zhu, 2017

机译：通过利用残差之间的相关性来提高残渣埋葬状态的预测
7. Exploiting Tag and Word Correlations for Improved Webpage Clustering [O] . Anusua Trivedi, Piyush Rai, Hal Daumé Iii, 2010

机译：利用标签和单词相关性改进网页聚类
8. Testing for Contagion Using Correlations: Some Words of Caution. [R] . Dungey, M., Zhumabekova, D. 2001

机译：使用相关性检测传染：一些注意事项。

Exploiting Tag and Word Correlations for Improved Webpage Clustering

摘要

著录项

相似文献

相关主题

期刊订阅