...
首页> 外文期刊>Multimedia Tools and Applications >Multimedia text classification algorithm using potential Dirichlet distribution in mobile cloud computing environment
【24h】

Multimedia text classification algorithm using potential Dirichlet distribution in mobile cloud computing environment

机译:多媒体文本分类算法使用移动云计算环境中的潜在Dirichlet分布

获取原文
获取原文并翻译 | 示例
           

摘要

In order to solve the problem of inaccurate description of news content features and user interest features in mobile cloud computing, proposed a multimedia text classification algorithm that utilizes multi-tag potential Dirichlet distribution. The algorithm is based on the traditional latent Dirichlet allocation (LDA) model and assumes a linear relationship between user tags and potential topics. Therefore, a relational matrix is introduced in the LDA model to describe the corresponding relationship between the tag and the topic, so that the probability distribution of the tag on the word can be inferred from the probability distribution of the topic on the word. The algorithm first learns the probability distribution table of label words by Gibbs sampling method, then infers the probability distribution of new documents on labels according to the model parameters, so as to realize the purpose of predicting the corresponding multiple labels of documents. In order to improve the ability of the algorithm to deal with massive data, the parallel algorithm has been improved. Since the bottleneck of the algorithm lies mainly in the serial nature of global variable updating and communication, the core idea of our parallelization is that in massive text training, global delay updating and asynchronous communication will not affect the final training results. Experiments show that the proposed algorithm has greatly improved the training efficiency. The classification accuracy is higher than that of Naive Bayesian algorithm and Support Vector Machine (SVM) algorithm proposed in other literatures. The average classification accuracy can achieve at about 95%, and it can be used as a general parallel framework of supervised LDA algorithm.
机译:为了解决移动云计算中的新闻内容特征和用户兴趣特征的不准确描述问题,提出了一种利用多标签电位Dirichlet分布的多媒体文本分类算法。该算法基于传统的潜在Dirichlet分配(LDA)模型,并假设用户标签和潜在主题之间的线性关系。因此,在LDA模型中引入了关系矩阵,以描述标签和主题之间的相应关系,从而可以从单词上的主题的概率分布推断单词上的标签的概率分布。该算法首先通过Gibbs采样方法了解标签单词的概率分布表,然后在根据模型参数上inders在标签上的概率分布,从而实现预测对应的多个文档标签的目的。为了提高算法处理大规模数据的能力,并行算法已经提高。由于算法的瓶颈主要位于全局变量更新和通信的串行性质,因此我们并行化的核心思想是,在大规模的文本培训中,全局延迟更新和异步通信不会影响最终的培训结果。实验表明,该算法大大提高了培训效率。分类精度高于其他文献中提出的天真贝叶斯算法和支持向量机(SVM)算法的分类精度。平均分类精度可以达到约95%,可以用作监督LDA算法的一般并行框架。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号