...
首页> 外文期刊>International journal of artificial life research >MHLM Majority Voting Based Hybrid Learning Model for Multi-Document Summarization
【24h】

MHLM Majority Voting Based Hybrid Learning Model for Multi-Document Summarization

机译:基于MHLM多数投票的多文件摘要混合学习模型

获取原文
获取原文并翻译 | 示例
           

摘要

Text summarization from multiple documents is an active research area in the current scenario as the data in the World Wide Web (WWW) is found in abundance. The text summarization process is time-consuming and hectic for the users to retrieve the relevant contents from this mass collection of the data. Numerous techniques have been proposed to provide the relevant information to the users in the form of the summary. Accordingly, this article presents the majority voting based hybrid learning model (MHLM) for multi-document summarization. First, the multiple documents are subjected to pre-processing, and the features, such as title-based, sentence length, numerical data and TF-IDF features are extracted for all the individual sentences of the document. Then, the feature set is sent to the proposed MHLM classifier, which includes the Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Neural Network (NN) classifiers for evaluating the significance of the sentences present in the document. These classifiers provide the significance scores based on four features extracted from the sentences in the document. Then, the majority voting model decides the significant texts based on the significance scores and develops the summary for the user and thereby, reduces the redundancy, increasing the quality of the summary similar to the original document. The experiment performed with the DUC 2002 data set is used to analyze the effectiveness of the proposed MHLM that attains the precision and recall at a rate of 0.94, f-measure at a rate of 0.93, and ROUGE-1 at a rate of 0.6324. Hybrid Learning Model; K-Nearest Neighbors; Multi-Document Summarization; Neural Network; Support Vector Machine.
机译:来自多个文件的文本摘要是当前场景中的一个活跃的研究区域,因为万维网(www)中的数据被发现丰富。文本摘要过程是耗时和忙的用户,用于从数据的此大众集合中检索相关内容。已经提出了许多技术以以摘要的形式向用户提供相关信息。因此,本文介绍了用于多文件摘要的大多数基于投票的混合学习模型(MHLM)。首先,对文档的所有单个句子提取多个文档进行预处理,以及基于标题,句子长度,数值和TF-IDF特征的特征。然后,将特征集发送到所提出的MHLM分类器,其包括用于评估文档中存在的句子的句子的重要性的支持向量机(SVM),k个最近邻居(KNN)和神经网络(NN)分类器。这些分类器基于从文档中的句子中提取的四个功能提供了重要性分数。然后,大多数投票模型根据重要性分数决定了重要的文本,并从而减少了冗余,从而提高了与原始文档类似的摘要的质量。使用DUC 2002数据集进行的实验用于分析所提出的MHLM的有效性,其以0.93的速率以0.94,F测量的速率达到精度,并以0.93的速率,并且速度为0.6324。混合学习模式; k-reallibor;多文件摘要;神经网络;支持向量机。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号