首页> 外文会议>IEEE International Congress on Big Data >Improving the quality of semantic relationships extracted from massive user behavioral data
【24h】

Improving the quality of semantic relationships extracted from massive user behavioral data

机译:提高从大量用户行为数据中提取的语义关系的质量

获取原文

摘要

As the ability to store and process massive amounts of user behavioral data increases, new approaches continue to arise for leveraging the wisdom of the crowds to gain insights that were previously very challenging to discover by text mining alone. For example, through collaborative filtering, we can learn previously hidden relationships between items based upon users' interactions with them, and we can also perform ontology mining to learn which keywords are semantically-related to other keywords based upon how they are used together by similar users as recorded in search engine query logs. The biggest challenge to this collaborative filtering approach is the variety of noise and outliers present in the underlying user behavioral data. In this paper we propose a novel approach to improve the quality of semantic relationships extracted from user behavioral data. Our approach utilizes millions of documents indexed into an inverted index in order to detect and remove noise and outliers.
机译:随着存储和处理大量用户行为数据的能力的增强,利用人群的智慧来获取洞察力的新方法不断涌现,而以前仅凭文本挖掘很难发现这些洞察力。例如,通过协作过滤,我们可以基于用户与项目之间的交互来了解项目之间先前隐藏的关系,并且我们还可以进行本体挖掘,以根据相似的方式将哪些关键字与其他关键字在语义上相关联用户记录在搜索引擎查询日志中。这种协作过滤方法的最大挑战是底层用户行为数据中存在的各种噪声和异常值。在本文中,我们提出了一种新颖的方法来提高从用户行为数据中提取的语义关系的质量。我们的方法利用索引成倒排索引的数百万个文档来检测和消除噪声和离群值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号