【24h】

Clustering search engine suggests by integrating a topic model and word embeddings

机译:聚类搜索引擎通过集成主题模型和Word Embeddings来建议

获取原文

摘要

The background of this paper is the issue of how to overview the knowledge of a given query keyword. Especially, we focus on concerns of those who search for Web pages with a given query keyword. The Web search information needs of a given query keyword is collected through search engine suggests. Given a query keyword, we collect up to around 1,000 suggests, while many of them are redundant. We cluster redundant search engine suggests based on a topic model. However, one limitation of the topic model based clustering of search engine suggests is that the granularity of the topics, i.e., the clusters of search engine suggests, is too coarse. In order to overcome the problem of the coarse-grained clusters of search engine suggests, this paper further applies the word embedding technique to the Web pages used during the training of the topic model, in addition to the text data of the whole Japanese version of Wikipedia. Then, we examine the word embedding based similarity between search engines suggests and further classify search engine suggests within a single topic into finer-grained subtopics based on the similarity of word embeddings. Evaluation results prove that the proposed approach performs well in the task of subtopic clustering of search engine suggests.
机译:本文的背景是如何概述给定查询关键字的知识的问题。特别是,我们专注于使用给定查询关键字搜索网页的人的关注点。通过搜索引擎收集给定查询关键字的网络搜索信息需求。鉴于查询关键字,我们收集到大约1,000个建议,而其中许多则是多余的。我们群集冗余搜索引擎基于主题模型建议。然而,基于主题模型的搜索引擎的群集的一个限制表明,主题的粒度,即搜索引擎的集群表明太粗糙。为了克服搜索引擎的粗粒群集群的问题表明,除了整个日本版本的文本数据之外,本文还将单词嵌入技术应用于主题模型的培训期间使用的网页维基百科。然后,我们检查搜索引擎之间的基于嵌入的相似性的单词建议,并且进一步分类搜索引擎在单个主题中建议基于Word Embeddings的相似性进入更精细的粗大的子主题。评估结果证明,该方法在搜索引擎的副间聚类的任务中表现良好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号