Keyword Extraction and Clustering for Document Recommendation in Conversations

Habibi Maryam; Popescu-Belis Andrei

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Keyword Extraction and Clustering for Document Recommendation in Conversations

【24h】

Keyword Extraction and Clustering for Document Recommendation in Conversations

机译：会话中文档推荐的关键词提取和聚类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper addresses the problem of keyword extraction from conversations, with the goal of using these keywords to retrieve, for each short conversation fragment, a small number of potentially relevant documents, which can be recommended to participants. However, even a short fragment contains a variety of words, which are potentially related to several topics; moreover, using an automatic speech recognition (ASR) system introduces errors among them. Therefore, it is difficult to infer precisely the information needs of the conversation participants. We first propose an algorithm to extract keywords from the output of an ASR system (or a manual transcript for testing), which makes use of topic modeling techniques and of a submodular reward function which favors diversity in the keyword set, to match the potential diversity of topics and reduce ASR noise. Then, we propose a method to derive multiple topically separated queries from this keyword set, in order to maximize the chances of making at least one relevant recommendation when using these queries to search over the English Wikipedia. The proposed methods are evaluated in terms of relevance with respect to conversation fragments from the Fisher, AMI, and ELEA conversational corpora, rated by several human judges. The scores show that our proposal improves over previous methods that consider only word frequency or topic similarity, and represents a promising solution for a document recommender system to be used in conversations.

机译：本文解决了从会话中提取关键字的问题，目标是使用这些关键字为每个简短的会话片段检索少量可能相关的文档，可以将这些文档推荐给参与者。但是，即使是很短的片段也包含各种单词，这些单词可能与多个主题相关。此外，使用自动语音识别（ASR）系统会在其中引入错误。因此，很难准确地推断出对话参与者的信息需求。我们首先提出一种算法，该算法从ASR系统（或用于测试的手动成绩单）的输出中提取关键字，该算法利用主题建模技术和有利于关键字集多样性的次模块奖励函数来匹配潜在的多样性并减少ASR噪音。然后，我们提出一种从该关键字集中派生多个局部分离的查询的方法，以便在使用这些查询进行英语维基百科搜索时最大程度地做出至少一个相关推荐的机会。根据与费舍尔，AMI和ELEA对话语料库中的对话片段的相关性，对所提出的方法进行了评估，并由几位人类法官进行了评分。得分显示，我们的建议比以前只考虑词频或主题相似度的方法有所改进，并且代表了在对话中使用文档推荐系统的有前途的解决方案。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2015年第4期|746-759|共14页
作者
Habibi Maryam; Popescu-Belis Andrei;
展开▼
作者单位

Idiap Research Institute, Centre du Parc, Idiap Research Institute and École Polytechnique Fédérale de Lausanne (EPFL), 1920 Martigny, Switzerland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Data mining; Encyclopedias; IEEE transactions; Information retrieval; Speech; Speech processing; Document recommendation; information retrieval; keyword extraction; meeting analysis; topic modeling;

机译：数据挖掘;百科全书;IEEE事务;信息检索;语音;语音处理;文档推荐;信息检索;关键词提取;会议分析;主题建模;

相似文献

外文文献
中文文献
专利

1. Construction of Keyword Extraction using Statistical Approaches and Document Clustering by Agglomerative method [J] . R. Nagarajan, Dr. P. Aruna International Journal of Engineering Research and Applications . 2016,第1期

机译：统计方法和关键词聚类的凝聚方法构建关键词提取
2. DOCUMENT CLUSTERING USING CO-WORD ANALYSIS AND FORMATION OF KEYWORD AGAINST DOCUMENT MATRIX [J] . Document Clustering, Text Mining, Keyword Extraction, Journal of Theoretical and Applied Information Technology . 2014,第3期

机译：使用共词分析对文档进行聚类以及针对文档矩阵的关键词形成
3. Using Topic Keyword Clusters for Automatic Document Clustering [J] . Hsi-Cheng CHANG, Chiun-Chieh HSU IEICE Transactions on Information and Systems . 2005,第8期

机译：使用主题关键字聚类进行自动文档聚类
4. Clustering of Research Documents - A Survey on Semantic Analysis and Keyword Extraction [C] . Srikesh Rajesh Nair, Gokul G, Akshay Anto Vadakkan, International Conference for Convergence in Technology . 2021

机译：研究文件的聚类 - 语义分析和关键词提取的调查
5. Keywords in the mist: Automated keyword extraction for very large documents and back of the book indexing. [D] . Csomai, Andras. 2008

机译：薄雾中的关键字：自动提取非常大的文档并在书后建立索引的关键字。
6. Listening to the HysterSisters: A Retrospective Keyword Frequency Analysis of Conversations About Hysterectomy Recovery [O] . Arpit Dave, Johnny Yi, Andy Boothe, 2019

机译：聆听哈斯秘书：关于子宫切除术恢复的对话的回顾关键词频率分析
7. DOCUMENT CLUSTERING USING AGGLOMERATIVE HIERARCHICAL CLUSTERING APPROACH (AHDC) AND PROPOSED TSG KEYWORD EXTRACTION METHOD [O] . R. Nagarajan . 2016

机译：使用聚焦分层聚类方法（AHDC）和提出的TSG关键字提取方法的文档聚类

Keyword Extraction and Clustering for Document Recommendation in Conversations

摘要

著录项

相似文献

相关主题

期刊订阅