首页> 外文学位 >Similarity measures and diversity rankings for query-focused sentence extraction.

【24h】

Similarity measures and diversity rankings for query-focused sentence extraction.

机译：面向查询的句子提取的相似性度量和多样性排名。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Query-focused sentence extraction generally refers to an extractive approach to select a set of sentences that responds to a specific information need. It is one of the major approaches employed in multi-document summarization, focused summarization, and complex question answering. The major advantage of most extractive methods over the natural language processing (NLP) intensive methods is that they are relatively simple, theoretically sound---drawing upon several supervised and unsupervised learning techniques, and often produce equally strong empirical performance. Many research areas, including information retrieval and text mining, have recently moved toward the extractive query-focused sentence generation as its outputs have great potential to support every day's information seeking activities. Particularly, as more information have been created and stored online, extractive-based summarization systems may quickly utilize several ubiquitous resources, such as Google search results and social medias, to extract summaries to answer users' queries.;This thesis explores how the performance of sentence extraction tasks can be improved to create higher quality outputs. Specifically, two major areas are investigated. First, we examine the issue of natural language variation which affects the similarity judgment of sentences. As sentences are much shorter than documents, they generally contain fewer occurring words. Moreover, the similarity notions of sentences are different than those of documents as they tend to be very specific in meanings. Thus many document-level similarity measures are likely to perform well at this level. In this work, we address these issues in two application domains. First, we present a hybrid method, utilizing both unsupervised and supervised techniques, to compute the similarity of interrogative sentences for factoid question reuse. Next, we propose a novel structural similarity measure based on sentence semantics for paraphrase identification and textual entailment recognition tasks. The empirical evaluations suggest the effectiveness of the proposed methods in improving the accuracy of sentence similarity judgments.;Furthermore, we examine the effects of the proposed similarity measure in two specific sentence extraction tasks, focused summarization and complex question answering. In conjunction with the proposed similarity measure, we also explore the issues of novelty, redundancy, and diversity in sentence extraction. To that end, we present a novel approach to promote diversity of extracted sets of sentences based on the negative endorsement principle. Negative-signed edges are employed to represent a redundancy relation between sentence nodes in graphs. Then, sentences are reranked according to the long-term negative endorsements from random walk. Additionally, we propose a unified centrality ranking and diversity ranking based on the aforementioned principle. The results from a comprehensive evaluation confirm that the proposed methods perform competitively, compared to many state-of-the-art methods.

机译：以查询为中心的句子提取通常是指一种提取方法，用于选择响应特定信息需求的一组句子。它是多文档摘要，重点摘要和复杂问题解答中使用的主要方法之一。与自然语言处理（NLP）密集型方法相比，大多数提取方法的主要优势在于它们相对简单，理论上合理-借鉴了几种有监督和无监督的学习技术，并且经常产生同样强大的经验表现。许多研究领域，包括信息检索和文本挖掘，最近已转向以提取查询为重点的句子生成，因为其输出具有巨大的潜力来支持日常的信息搜索活动。特别是随着更多信息的创建和在线存储，基于提取的摘要系统可能会迅速利用Google搜索结果和社交媒体等广泛使用的资源来提取摘要来回答用户的查询。可以改进句子提取任务以创建更高质量的输出。具体来说，研究了两个主要领域。首先，我们研究自然语言变异的问题，这会影响句子的相似性判断。由于句子比文档短得多，因此它们通常包含较少的单词。此外，句子的相似性概念与文档的相似性概念不同，因为它们的含义往往非常具体。因此，许多文档级别的相似性度量可能在此级别上表现良好。在这项工作中，我们在两个应用程序域中解决这些问题。首先，我们提出一种混合方法，利用无监督和有监督的技术来计算疑问句的相似性，以重用事实类问题。接下来，我们提出一种基于句子语义的新颖结构相似性度量，用于释义识别和文本蕴涵识别任务。实证评估表明，所提出的方法在提高句子相似度判断的准确性上是有效的。此外，我们研究了所提出的相似度测量在两个特定的句子提取任务，集中摘要和复杂问题回答中的效果。结合提出的相似性度量，我们还探讨了句子提取中的新颖性，冗余性和多样性。为此，我们提出了一种基于否定背书原则的新方法，以促进句子提取集的多样性。负号边缘用于表示图中句子节点之间的冗余关系。然后根据随机游走的长期负面认可对句子重新排序。此外，我们基于上述原则提出了统一的中心度排名和多样性排名。综合评估的结果证实，与许多最新方法相比，该方法具有竞争优势。

著录项

作者
Achananuparp, Palakorn.;
展开▼
作者单位

Drexel University.;

展开▼
授予单位 Drexel University.;
学科 Computer Science.
学位 Ph.D.
年度 2010
页码 167 p.
总页数 167
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. iSpreadRank: Ranking sentences for extraction-based summarization using feature weight propagation in the sentence similarity network [J] . Jen-Yuan Yeh, Hao-Ren Ke, Wei-Pang Yang Expert systems with applications . 2008,第3期

机译：iSpreadRank：在句子相似性网络中使用特征权重传播对句子进行排序，以进行基于提取的摘要
2. A New Sentence Similarity Measure And Sentence Based Extractive Technique For Automatic Text Summarization [J] . Ramiz M. Aliguliyev Expert systems with applications . 2009,第4期

机译：一种新的句子相似度度量和基于句子的自动文本摘要提取技术
3. A comparative study of ranking methods, similarity measures and uncertainty measures for interval type-2 fuzzy sets [J] . Wu DR, Mendel JM Information Sciences: An International Journal . 2009,第8期

机译：区间2型模糊集排序方法，相似性度量和不确定性度量的比较研究
4. Double-Hypergraph Based Sentence Ranking for Query-Focused Multi-document Summarizaton [C] . Xiaoyan Cai, Junwei Han, Lei Guo, 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops . 2016

机译：基于双象形图的查询多文档摘要句子排序
5. QUERY-FOCUSED EXTRACTIVE SUMMARIZATION BASED ON DEEP LEARNING: COMPARISON OF SIMILARITY MEASURES FOR PSEUDO GROUND TRUTH GENERATION [D] . Yuliska 2019

机译：基于深度学习的查询重点摘要：伪地面真相生成相似度量的比较
6. Indirect two-sided relative ranking: a robust similarity measure for gene expression data [O] . Louis Licamele, Lise Getoor 2010

机译：间接两面相对排名：基因表达数据的鲁棒相似性度量
7. A Comparative Study of Ranking Methods, Similarity Measures and Uncertainty Measures for Interval Type-2 Fuzzy Sets [O] . Dongrui Wu, Student Member 2008

机译：区间2型模糊集排序方法，相似性度量和不确定性度量的比较研究

Similarity measures and diversity rankings for query-focused sentence extraction.

摘要

著录项

相似文献

相关主题

期刊订阅