首页> 外文会议>11th IEEE International Conference on Data Mining >Cross Domain Random Walk for Query Intent Pattern Mining from Search Engine Log
【24h】

Cross Domain Random Walk for Query Intent Pattern Mining from Search Engine Log

机译:跨域随机游走,用于从搜索引擎日志中查询查询意图模式

获取原文
获取原文并翻译 | 示例

摘要

Understanding search intents of users through their condensed short queries has attracted much attention both in academia and industry. The search intents of users are generally assumed to be associated with various query patterns, such as "MobileName price", where "MobileName" could be any named entity of mobile phone model and this pattern indicates that the user intends to buy a mobile phone. However, discovering the query intent patterns for general search is challenging mainly due to the difficulty in collecting sufficient training data for learning query patterns across a large number of searchable domains. In this work, we propose Cross Domain Random Walk (CDRW) algorithm, which is semi-supervised, to discover the query intent patterns across different domains from search engine click-through log data. Starting with some manually tagged seed queries in one or more independent domains, CDRW takes the query patterns as bridge and propagates the transition probability across domains to collect the query intent patterns among different domains based on the assumption that "users who have similar intent in different but similar domains will have high probability to share similar query patterns across domains". Different from classical random walk algorithms, CDRW walks across different domains to disseminate the shared knowledge in a transfer learning manner. Extensive experiment results on real log data of a commercial search engine well validate the effectiveness and efficiency of the proposed algorithm.
机译:通过简短的简短查询来了解用户的搜索意图已经引起了学术界和行业的广泛关注。通常假定用户的搜索意图与各种查询模式相关联,例如“ MobileName price”,其中“ MobileName”可以是手机型号的任何命名实体,并且该模式指示用户打算购买手机。然而,发现用于一般搜索的查询意图模式是具有挑战性的,主要是由于难以收集足够的训练数据以跨大量可搜索域学习查询模式的困难。在这项工作中,我们提出了半监督的跨域随机游走(CDRW)算法,以从搜索引擎点击日志数据中发现不同域的查询意图模式。从一个或多个独立域中的一些手动标记的种子查询开始,CDRW将查询模式作为桥梁,并在各个域之间传播转移概率,以基于“在不同域中具有相似意图的用户”的假设来收集不同域之间的查询意图模式。但是相似的域将很有可能在各个域之间共享相似的查询模式”。与经典的随机游走算法不同,CDRW跨不同的域游走,以转移学习的方式传播共享知识。商业搜索引擎真实日志数据的大量实验结果很好地验证了该算法的有效性和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号