ELSA: A Multilingual Document Summarization Algorithm Based on Frequent Itemsets and Latent Semantic Analysis

Cagliero Luca; Garza Paolo; Baralis Elena

首页> 外文期刊>ACM Transactions on Information Systems >ELSA: A Multilingual Document Summarization Algorithm Based on Frequent Itemsets and Latent Semantic Analysis

【24h】

ELSA: A Multilingual Document Summarization Algorithm Based on Frequent Itemsets and Latent Semantic Analysis

机译：ELSA：基于频繁项集和潜在语义分析的多语言文档摘要算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sentence-based summarization aims at extracting concise summaries of collections of textual documents. Summaries consist of a worthwhile subset of document sentences. The most effective multilingual strategies rely on Latent Semantic Analysis (LSA) and on frequent itemset mining, respectively. LSA-based summarizers pick the document sentences that cover the most important concepts. Concepts are modeled as combinations of single-document terms and are derived from a term-by-sentence matrix by exploiting Singular Value Decomposition (SVD). Itemset-based summarizers pick the sentences that contain the largest number of frequent itemsets, which represent combinations of frequently co-occurring terms. The main drawbacks of existing approaches are (i) the inability of LSA to consider the correlation between combinations of multiple-document terms and the underlying concepts, (ii) the inherent redundancy of frequent itemsets because similar itemsets may be related to the same concept, and (iii) the inability of itemset-based summarizers to correlate itemsets with the underlying document concepts. To overcome the issues of both of the abovementioned algorithms, we propose a new summarization approach that exploits frequent itemsets to describe all of the latent concepts covered by the documents under analysis and LSA to reduce the potentially redundant set of itemsets to a compact set of =correlated concepts. The summarizer selects the sentences that cover the latent concepts with minimal redundancy. We tested the summarization algorithm on both multilingual and English-language benchmark document collections. The proposed approach performed significantly better than both itemset- and LSA-based summarizers, and better than most of the other state-of-the-art approaches.

机译：基于句子的摘要旨在提取文本文档集合的简洁摘要。摘要由有价值的文档句子子集组成。最有效的多语言策略分别依赖于潜在语义分析（LSA）和频繁项集挖掘。基于LSA的摘要器选择涵盖最重要概念的文档句子。概念被建模为单文档术语的组合，并通过利用奇异值分解（SVD）从逐项术语矩阵中得出。基于项目集的摘要程序选择包含最多数量的频繁项目集的句子，这些句子表示频繁出现的常见词语的组合。现有方法的主要缺点是：（i）LSA无法考虑多文档术语组合与基本概念之间的相关性；（ii）频繁项目集的固有冗余性，因为相似的项目集可能与同一概念相关，（iii）基于项目集的汇总器无法将项目集与基础文档概念相关联。为了克服上述两种算法的问题，我们提出了一种新的汇总方法，该方法利用频繁的项目集来描述分析文档和LSA所涵盖的所有潜在概念，以将潜在的冗余项目集减少为=的紧凑集合。相关概念。摘要器选择覆盖潜在概念的句子，且冗余度最小。我们在多语言和英语基准文档集上测试了汇总算法。与基于项目集和基于LSA的汇总器相比，所提出的方法的性能要好得多，并且比大多数其他最新技术也要好。

著录项

来源
《ACM Transactions on Information Systems》 |2019年第2期|21.1-21.33|共33页
作者
Cagliero Luca; Garza Paolo; Baralis Elena;
展开▼
作者单位

Politecn Torino, Dipartimento Automat & Informat, Corso Duca Abruzzi 24, I-10129 Turin, Italy;

Politecn Torino, Dipartimento Automat & Informat, Corso Duca Abruzzi 24, I-10129 Turin, Italy;

Politecn Torino, Dipartimento Automat & Informat, Corso Duca Abruzzi 24, I-10129 Turin, Italy;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Multilingual summarization; text mining; frequent weighted itemset mining;

机译：多语言摘要;文本挖掘;频繁加权项集挖掘;

相似文献

外文文献
中文文献
专利

1. ELSA: a multilingual document summarization algorithm based on frequent itemsets and latent semantic analysis [J] . Epaminondas Kapetanios Computing reviews . 2021,第1期

机译：ELSA：一种基于频繁项目集和潜在语义分析的多语言文献摘要算法
2. ELSA: a multilingual document summarization algorithm based on frequent itemsets and latent semantic analysis [J] . Epaminondas Kapetanios Computing reviews . 2021,第1期

机译：ELSA：一种基于频繁项目集和潜在语义分析的多语言文献摘要算法
3. ELSA: a multilingual document summarization algorithm based on frequent itemsets and latent semantic analysis. [J] . M. Sohel Rahman Computing reviews . 2020,第5期

机译：ELSA：一种基于频繁项目集和潜在语义分析的多语言文档摘要算法。
4. Applying Latent Semantic Indexing in Frequent Itemset Mining for Document Relation Discovery [C] . Thanaruk Theeramunkong, Kritsada Sriphaew, Manabu Okumura Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining . 2008

机译：应用潜在语义索引在频繁的项目集挖掘中进行文档关系发现
5. Frequent Itemset Hiding Algorithm Using Frequent Pattern Tree Approach. [D] . Alnatsheh, Rami. 2012

机译：使用频繁模式树方法的频繁项集隐藏算法。
6. An index-based algorithm for fast on-line query processing of latent semantic analysis [O] . Mingxi Zhang, Pohan Li, Wei Wang -1

机译：基于索引的潜在语义分析快速在线查询处理
7. A Frequent Term and Semantic Similarity based Single Document Text Summarization Algorithm [O] . Naresh Kumar Nagwani, Dr. Shrish Verma 2011

机译：基于频繁项和语义相似度的单文档文本摘要算法

ELSA: A Multilingual Document Summarization Algorithm Based on Frequent Itemsets and Latent Semantic Analysis

摘要

著录项

相似文献

相关主题

期刊订阅