首页> 外文期刊>Expert Systems with Application >Latent association rule cluster based model to extract topics for classification and recommendation applications
【24h】

Latent association rule cluster based model to extract topics for classification and recommendation applications

机译:基于潜在关联规则聚类的模型提取主题以进行分类和推荐应用

获取原文
获取原文并翻译 | 示例
           

摘要

The quality of any text mining technique is highly dependent on the features that are used to represent the document collection. A classical form of document representation is the vector space model (VSM), according to which the documents are represented as vectors of weights that correspond to the features of the documents. The bag-of-words model is the most popular VSM approach due to its simplicity and general applicability, but this model does not include term dependency and has a high dimensionality. In the literature, several models for document representation have been proposed in order to capture the dependency of terms. Among them, the topic model representation is one of the most interesting approaches - since it describes the collection of documents in a way that reveals their internal structure and the interrelationships therein, and also provides a dimensionality reduction. However, even for topic models, the efficient extraction of information concerning the relations among terms for document representation is still a major research challenge. In order to address this issue, we proposed thelatent association rule cluster based model (LARCM). The LARCM is a non-probabilistic topic model that makes use of association rule clustering to build a document representation with low dimensionality in such a way that each feature (i.e., topic) is comprised of information concerning relations among the terms. We evaluated the interpretability of the topics obtained by using our proposed model against the ones provided by the traditional latent dirichlet allocation (LDA) model and the LDA model using a document representation that includes correlated terms (i.e., bag-of-related-words). The experimental results indicated that the LARCM provides topics with better interpretability than the LDA models. Additionally, we used the topics obtained by the LARCM in two different applications: text classification and page recommendation. With respect to text classification, the topics were used to improve document collection representation. Concerning page recommendation, topics were used as contextual information in context-aware recommender systems. Results have shown that the topics provided by the LARCM can be used to improve both applications.
机译:任何文本挖掘技术的质量高度依赖于用于表示文档集合的功能。文档表示的一种经典形式是向量空间模型(VSM),根据该模型,文档被表示为与文档特征相对应的权重矢量。词袋模型由于其简单性和通用性而成为最受欢迎的VSM方法,但是该模型不包括术语依赖性,并且具有较高的维度。在文献中,已经提出了几种用于文档表示的模型,以便捕获术语的依赖性。其中,主题模型表示是最有趣的方法之一-因为它以揭示文档内部结构和其中相互关系的方式描述了文档的集合,并且还提供了降维功能。然而,即使对于主题模型,关于用于文档表示的术语之间的关系的信息的有效提取仍然是主要的研究挑战。为了解决这个问题,我们提出了基于潜在关联规则聚类的模型(LARCM)。 LARCM是一种非概率主题模型,它利用关联规则聚类来构建低维的文档表示形式,以使每个特征(即主题)都包含有关术语之间关系的信息。我们使用包含相关术语(例如,相关词袋)的文档表示,通过使用我们提出的模型相对于传统潜在狄利克雷分配(LDA)模型和LDA模型提供的主题,评估了主题的可解释性。实验结果表明,与LDA模型相比,LARCM为主题提供了更好的解释性。此外,我们在两个不同的应用程序中使用了LARCM获得的主题:文本分类和页面推荐。关于文本分类,主题用于改进文档集的表示形式。关于页面推荐,主题被用作上下文感知推荐系统中的上下文信息。结果表明,LARCM提供的主题可用于改进两个应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号