...
首页> 外文期刊>International journal of artificial life research >MHLM Majority Voting Based Hybrid Learning Model for Multi-Document Summarization
【24h】

MHLM Majority Voting Based Hybrid Learning Model for Multi-Document Summarization

机译:基于MHLM多数投票的混合学习模型,用于多文档摘要

获取原文
获取原文并翻译 | 示例
           

摘要

Text summarization from multiple documents is an active research area in the current scenario as the data in the World Wide Web (WWW) is found in abundance. The text summarization process is time-consuming and hectic for the users to retrieve the relevant contents from this mass collection of the data. Numerous techniques have been proposed to provide the relevant information to the users in the form of the summary. Accordingly, this article presents the majority voting based hybrid learning model (MHLM) for multi-document summarization. First, the multiple documents are subjected to pre-processing, and the features, such as title-based, sentence length, numerical data and TF-IDF features are extracted for all the individual sentences of the document. Then, the feature set is sent to the proposed MHLM classifier, which includes the Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Neural Network (NN) classifiers for evaluating the significance of the sentences present in the document. These classifiers provide the significance scores based on four features extracted from the sentences in the document. Then, the majority voting model decides the significant texts based on the significance scores and develops the summary for the user and thereby, reduces the redundancy, increasing the quality of the summary similar to the original document. The experiment performed with the DUC 2002 data set is used to analyze the effectiveness of the proposed MHLM that attains the precision and recall at a rate of 0.94, f-measure at a rate of 0.93, and ROUGE-1 at a rate of 0.6324. Hybrid Learning Model; K-Nearest Neighbors; Multi-Document Summarization; Neural Network; Support Vector Machine.
机译:在当前情况下,来自多个文档的文本摘要是一个活跃的研究领域,因为在万维网(WWW)中的数据非常丰富。文本摘要过程非常耗时,并且用户需要从大量数据中检索相关内容。已经提出了许多技术以摘要的形式向用户提供相关信息。因此,本文提出了基于多数投票的混合学习模型(MHLM),用于多文档摘要。首先,对多个文档进行预处理,然后为文档的所有单个句子提取特征(如基于标题,句子长度,数字数据和TF-IDF特征)。然后,将特征集发送到建议的MHLM分类器,该分类器包括支持向量机(SVM),K最近邻(KNN)和神经网络(NN)分类器,用于评估文档中存在的句子的重要性。这些分类器基于从文档中句子中提取的四个特征来提供显着性得分。然后,多数投票模型根据重要性得分决定重要文本,并为用户开发摘要,从而减少了冗余,提高了摘要的质量,类似于原始文档。使用DUC 2002数据集进行的实验用于分析所提出的MHLM的有效性,该MHLM以0.94的速率获得精度和召回率,以0.93的速率获得f-measure,以0.6324的速率获得ROUGE-1。混合学习模型; K最近邻居;多文档摘要;神经网络;支持向量机。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号