Improving the Performance of Multivariate Bernoulli Model based Documents Clustering Algorithms using Transformation Techniques | Science Publications

N. Raju; P. Pitchandi

首页> 外文期刊>Journal of computer sciences >Improving the Performance of Multivariate Bernoulli Model based Documents Clustering Algorithms using Transformation Techniques | Science Publications

【24h】

Improving the Performance of Multivariate Bernoulli Model based Documents Clustering Algorithms using Transformation Techniques | Science Publications

机译：使用转换技术提高基于多元伯努利模型的文档聚类算法的性能科学出版物

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

> Problem statement: Document clustering is the most important areas of data mining since they are very much and currently the subject of significant global research since such areas strengthen the enterprises of web intelligence, web mining, web search engine design and so forth. Generative models based on the multivariate Bernoulli and multinomial distributions have been widely used for text classification. Approach: This study explores the suitability of multivariate Bernoulli model based probabilistic algorithm for text clustering application. In a multivariate Bernoulli model, a document is represented as a binary vector over the space of words with 0 and 1, indicating that whether word occurs or not in the document. The number of occurrences is not considered. So the word frequency information is lost due to this nature of implementation. In this work, we propose a FFT based transformation technique for improving clustering performance of multivariate Bernoulli model based probabilistic algorithm. We are using the transformation technique to transform the actual term frequency count data in to a time domain signal. So, the weight of frequency of each word will be distributed throughout each row of records. Now if we apply multivariate Bernoulli model on values less than zero and greater than zero, the performance will get increased since there is no information loss in this kind of data representation. Results: In this work, Bernoulli model-based clustering and an improved version of the same will be implemented and evaluated using suitable metrics and the results are shown. Conclusion: The transformation technique in multivariate Bernoulli model improves the performance of document clustering significantly.

机译： > 问题陈述：文档聚类是数据挖掘的最重要领域，因为它们是非常重要的，并且由于这些领域增强了Web Intelligence的企业地位，因此目前是全球范围内重要的研究课题，网络挖掘，网络搜索引擎设计等。基于多元伯努利和多项式分布的生成模型已被广泛用于文本分类。方法：本研究探讨了基于多元伯努利模型的概率算法在文本聚类中的适用性。在多元伯努利模型中，文档被表示为具有0和1的单词空间上的二进制向量，指示单词是否在文档中出现。不考虑出现次数。因此，由于实施的这种性质，单词频率信息丢失了。在这项工作中，我们提出了一种基于FFT的变换技术，以提高基于多元伯努利模型的概率算法的聚类性能。我们正在使用转换技术将实际项频率计数数据转换为时域信号。因此，每个单词的频率权重将分布在记录的每一行中。现在，如果我们在小于零且大于零的值上应用多元伯努利模型，由于在这种数据表示中没有信息丢失，因此性能将得到提高。结果：在这项工作中，将使用适当的指标来实施和评估基于伯努利模型的聚类及其改进版本，并显示结果。结论：多元Bernoulli模型中的转换技术显着提高了文档聚类的性能。

著录项

来源
《Journal of computer sciences》 |2011年第5期|共页
作者
N. Raju; P. Pitchandi;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Improving the Performance of Multivariate Bernoulli Model based Documents Clustering Algorithms using Transformation Techniques [J] . Perumal Pitchandi, Nedunchezhian Raju Journal of computer sciences . 2011,第5期

机译：使用转换技术提高基于多元伯努利模型的文档聚类算法的性能
2. Markov-Modulated Bernoulli-Based Performance Analysis for Gentle BLUE and BLUE Algorithms under Bursty and Correlated Traffic | Science Publications [J] . Adeeb Alsaaidah, Hussein Abdel-Jaber, Mohd Fadzli, Journal of computer sciences . 2016,第6期

机译：突发流量和相关流量下基于Markov调制的基于Bernoulli的温和BLUE和BLUE算法的性能分析| Business Wire科学出版物
3. Improving the Performance of Machine Learning Based Multi Attribute Face Recognition Algorithm Using Wavelet Based Image Decomposition Technique | Science Publications [J] . M. Rajaram, S. Sakthivel Journal of computer sciences . 2011,第3期

机译：基于小波的图像分解技术提高基于机器学习的多属性面部识别算法的性能科学出版物
4. A model based data normalization technique for improving performance of engine misfire detection algorithms [C] . Institute of Electrical and Electronics Engineers Electro/Information Technology Conference . 2004

机译：基于模型的数据归一化技术，用于提高发动机失火检测算法性能
5. Model-based clustering algorithms, performance and application [D] . Liu, Jun 2000

机译：基于模型的聚类算法，性能和应用
6. Multivariate Analysis Models Based on Full Spectra Range and Effective Wavelengths Using Different Transformation Techniques for Rapid Estimation of Leaf Nitrogen Concentration in Winter Wheat [O] . Lantao Li, Di Lin, Jin Wang, 2020

机译：基于全光谱范围和有效波长的多变量分析模型使用不同转化技术进行冬小麦叶片氮浓度快速估算
7. Improving the Performance of Multivariate Bernoulli Model based Documents Clustering Algorithms using Transformation Techniques [O] . P. Pitchandi, N. Raju 2011

机译：基于变换技术提高基于Bernoulli多元模型的文档聚类算法的性能
8. Algorithms and Models Based on Projective Transformations in Spatial Location, Regional Planning, and Central Place Theory [R] . Lindgren, C. E. S. 1969

机译：基于投影变换的空间位置，区域规划和中心地理论的算法与模型

Improving the Performance of Multivariate Bernoulli Model based Documents Clustering Algorithms using Transformation Techniques | Science Publications

摘要

著录项

相似文献

相关主题

期刊订阅