...
首页> 外文期刊>Procedia Computer Science >Latent Dirichlet Allocation (LDA) for improving the topic modeling of the official bulletin of the spanish state (BOE)
【24h】

Latent Dirichlet Allocation (LDA) for improving the topic modeling of the official bulletin of the spanish state (BOE)

机译:潜在狄利克雷分配(LDA),用于改进西班牙国家(BOE)官方公告的主题建模

获取原文
           

摘要

Since Internet was born most people can access fully free to a lot sources of information. Every day a lot of web pages are created and new content is uploaded and shared. Never in the history the humans has been more informed but also uninformed due the huge amount of information that can be access. When we are looking for something in any search engine the results are too many for reading and filtering one by one. Recommended Systems (RS) was created to help us to discriminate and filter these information according to ours preferences.This contribution analyses the RS of the official agency of publications in Spain (BOE), which is known as"Mi BOE". The way this RS works was analysed, and all the meta-data of the published documents were analysed in order to know the coverage of the system. The results of our analysis show that more than 89% of the documents cannot be recommended, because they are not well described at the documentary level, some of their key meta-data are empty. So, this contribution proposes a method to label documents automatically based on Latent Dirichlet Allocation (LDA). The results are that using this approach the system could recommend (at a theoretical point of view) more than twice of documents that it now does, 11% vs 23% after applied this approach.
机译:自Internet诞生以来,大多数人可以完全免费地访问许多信息源。每天都会创建许多网页,并上载和共享新内容。历史上从来没有人对人类有更多的了解,但由于可以访问的信息量巨大,所以他们还是一无所知。当我们在任何搜索引擎中查找内容时,结果太多了,无法一一读取和过滤。推荐系统(RS)的创建是为了帮助我们根据我们的偏好来区分和过滤这些信息。此文稿分析了西班牙官方出版物机构(BOE)的RS,该机构被称为“ Mi BOE”。分析了该RS的工作方式,并分析了已发布文档的所有元数据,以了解系统的覆盖范围。我们的分析结果表明,不推荐使用超过89%的文档,因为在文档级别没有很好地描述它们,因此其中一些关键元数据为空。因此,该贡献提出了一种基于潜在狄利克雷分配(LDA)自动标记文档的方法。结果是,使用这种方法,系统可以建议(从理论上来说)两倍于它现在所做的文档,应用该方法后为11%对23%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号