首页> 外文会议>Signal Processing and Communications Applications Conference >Big data feature selection and projection for gender prediction based on user web behaviour
【24h】

Big data feature selection and projection for gender prediction based on user web behaviour

机译:基于用户网络行为的性别预测大数据特征选择和预测

获取原文

摘要

Prediction of a visitors' gender and other demographic information helps with the presentation of the appropriate content to the user. In this paper, we perform gender prediction based on Turkish users' web log data. Our methods use three different sets of features, namely the URLs (Uniform Resource Locator), the textual contents and the DMOZ (from directory.mozilla.org) hierarchies of the pages visited by each user. Since we have a sparse high-dimensional input dataset, first we apply Information Gain and Chi-square based feature selection. We use a MapReduce based approach to compute these feature relevance measures. We also apply stochastic singular value decomposition (SSVD) feature projection method. We present gender classification results, based on these feature selection and projection methods, using the Logistic Regression classifier. Using the Logistic Regression classifier on the selected URL features results in the best performance.
机译:访客性别和其他人口统计信息的预测有助于向用户呈现适当的内容。在本文中,我们根据土耳其用户的网络日志数据执行性别预测。我们的方法使用三组不同的功能,即每个用户访问的页面的URL(统一资源定位符),文本内容和DMOZ(来自directory.mozilla.org)。由于我们具有稀疏的高维输入数据集,因此首先我们应用基于信息增益和卡方的特征选择。我们使用基于MapReduce的方法来计算这些特征相关性度量。我们还应用了随机奇异值分解(SSVD)特征投影方法。我们使用Logistic回归分类器,基于这些特征选择和投影方法,提出了性别分类结果。在选定的URL功能上使用Logistic回归分类器可获得最佳性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号