一种基于词加权LDA模型的专利文献分类方法

孙伟; 刘文静; 葛丽阁; 余璇

首页> 中文期刊>计算机技术与发展 >一种基于词加权LDA模型的专利文献分类方法

一种基于词加权LDA模型的专利文献分类方法

开具论文收录证明 >>

期刊封面封底目录下载 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

When the traditional topic model carries on the text classification, its characteristic words choose the high frequency words under the law of statistics. However, in the patent literature classification, most professional words are often overwhelmed by high frequency words, resulting in the low accuracy of the topic model in the classification of patent documents. Therefore, we present a supervised LDA topic model based on word weighted for the classification of patent documents. Based on the co-occurrence relationship between professional words and high-frequency words, KeyGraph algorithm is used to select the keywords with better characterization, and the mutual information function is used to calculate the weight of each keyword to establish a professional word dictionary. On this basis, a supervised LDA model is built, the word weighted is extended to the LDA model and Gibbs Sampling is used to estimate the parameters. Compared with the LDA model and its two variant models, the classification accuracy of the model is improved by 4.62％, 3.74％ and 3.26％ respectively on the patent documents. It shows that the high degree of specialization words selected by the model has a higher degree of relevance to the topic, and the classification efficiency and accuracy are significantly improved.%传统的主题模型在进行文本分类时,特征词多选取统计规律下的高频词,而在专利文献分类中,多数专业词汇往往被高频词所淹没,造成主题模型在专利文献分类的准确率不高.对此,提出一种基于词加权的有监督LDA主题模型用于专利文献的分类.从专业词与高频词的共现关系出发,利用KeyGraph算法选取特征表征能力更优的关键词,再利用互信息函数计算各关键词权重,建立专业词字典.在此基础上,建立一个有监督的LDA模型,将词加权扩展至LDA模型,并采用Gibbs Sampling进行参数估计.在专利文献上进行分类实验,与LDA模型及其两种变型模型相比,该模型分类准确率分别平均提高了4.62％、3.74％和3.26％.表明该模型选取的高区分度的专业词汇与主题关联度更高,分类效率和准确率均有明显提高.

著录项

来源
《计算机技术与发展》|2019年第3期|23-29|共7页
作者
孙伟; 刘文静; 葛丽阁; 余璇;
展开▼
作者单位

上海海事大学信息工程学院, 上海 201306;

上海海事大学信息工程学院, 上海 201306;

上海海事大学信息工程学院, 上海 201306;

上海海事大学信息工程学院, 上海 201306;

展开▼
原文格式 PDF
正文语种 chi
中图分类人工智能理论;
关键词
加权模型; LDA; KeyGraph算法; 专利文献分类;

相似文献

中文文献
外文文献
专利

1. 基于主题加权LDA模型的情感分类方法 [J] . 王飞雪 ,李芳 . 西南师范大学学报（自然科学版） . 2018,第009期
2. 基于复合加权LDA模型的书目信息分类方法研究 [J] . 李湘东 ,丁丛 ,高凡 . 情报学报 . 2017,第004期
3. 一种基于LDA主题模型的评论文本情感分类方法 [J] . 王伟 ,周咏梅 ,阳爱民 . 数据采集与处理 . 2017,第003期
4. 一种基于加权LDA模型的文本聚类方法 [J] . 李国 ,张春杰 ,张志远 . 中国民航大学学报 . 2016,第002期
5. 一种基于LDA模型的关键词抽取方法 [J] . 朱泽德 ,李淼 ,张健 . 中南大学学报（自然科学版） . 2015,第006期
6. 基于Labeled LDA主题模型的医学文献自动分类方法 [C] . 宫小翠 ,安新颖 ,单连慧 . 中华医学会第二十四次全国医学信息学术会议 . 2018
7. 一种词性标注LDA模型的文本分类方法研究 [A] . 张超 . 2015

一种基于词加权LDA模型的专利文献分类方法

摘要

著录项

相似文献

相关主题

期刊订阅