首页> 外国专利> DATA-PARALLEL PARAMETER ESTIMATION OF THE LATENT DIRICHLET ALLOCATION MODEL BY GREEDY GIBBS SAMPLING

DATA-PARALLEL PARAMETER ESTIMATION OF THE LATENT DIRICHLET ALLOCATION MODEL BY GREEDY GIBBS SAMPLING

机译:贪婪吉布斯抽样的潜在狄利克雷分配模型的数据并行参数估计

摘要

A novel data-parallel algorithm is presented for topic modeling on a highly-parallel hardware architectures. The algorithm is a Markov-Chain Monte Carlo algorithm used to estimate the parameters of the LDA topic model. This algorithm is based on a highly parallel partially-collapsed Gibbs sampler, but replaces a stochastic step that draws from a distribution with an optimization step that computes the mean of the distribution directly and deterministically. This algorithm is correct, it is statistically performant, and it is faster than state-of-the art algorithms because it can exploit the massive amounts of parallelism by processing the algorithm on a highly-parallel architecture, such as a GPU. Furthermore, the partially-collapsed Gibbs sampler converges about as fast as the collapsed Gibbs sampler and identifies solutions that are as good, or even better, as the collapsed Gibbs sampler.
机译:提出了一种新颖的数据并行算法,用于在高度并行的硬件体系结构上进行主题建模。该算法是马尔可夫链蒙特卡罗算法,用于估计LDA主题模型的参数。该算法基于高度并行的部分折叠的Gibbs采样器,但是用优化步骤代替了从分布中抽取的随机步骤,该优化步骤可直接确定地计算分布的平均值。该算法是正确的,具有统计学上的性能,并且比最新的算法快,因为它可以通过在高度并行的体系结构(例如GPU)上处理算法来利用大量的并行性。此外,部分折叠的Gibbs采样器的收敛速度与折叠的Gibbs采样器一样快,并且可以确定与折叠的Gibbs采样器一样好甚至更好的解决方案。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号