首页> 外文学位 >Aspect and Entity Extraction from Opinion Documents.
【24h】

Aspect and Entity Extraction from Opinion Documents.

机译:从意见文档中提取方面和实体。

获取原文
获取原文并翻译 | 示例

摘要

Opinion mining has been an active research area in Web mining and Natural Language Processing (NLP) in recent years. In this thesis, we present a comprehensive study of aspect and entity extraction from opinion documents for opinion mining. We first introduce the aspect-based opinion mining model. Then, we propose a new method for aspect extraction and ranking, which is based on language patterns and dependency grammar. Meanwhile, it is capable of ranking extracted aspects by their importance, i.e. relevancy and frequency. In addition, we discover that there are two kinds of special product aspects in some domains. One is noun aspect implying opinion. The other is the resource term. Novel extraction algorithms are proposed to identify them from opinion documents. In terms of entity extraction task, it is similar to the classic named entity extraction (NER) problem. However, there is a major difference. In a typical opinion mining application, the users often want to find opinions on some competing entities, e.g., competing or relevant products. This implies that the discovered entities must be of the same type/class. Basically, this is a set expansion problem. To deal with this problem, we present two set expansion algorithms for entity extraction in opinion documents. One is based on positive and unlabeled (PU) learning model. The other is based on Bayesian Sets. We also discuss extracting topic documents from a collection. Opinion mining system crawls and indexes opinion documents first and then used for different specific tasks later. Typically, the documents are not well categorized because one does not know what the future tasks will be. Normally, keyword search is used to find relevant opinion documents for analysis. However, the documents that are retrieved in this way can have both low recall and low precision. Another way is to train a document classifier. But the training procedure is time-consuming and labor-intensive. We propose an unsupervised technique to solve this problem based on a new PU learning algorithm.
机译:近年来,观点挖掘一直是Web挖掘和自然语言处理(NLP)领域的活跃研究领域。在本文中,我们对从意见文档中抽取方面和实体进行了全面的研究,以进行意见挖掘。我们首先介绍基于方面的意见挖掘模型。然后,我们提出了一种基于语言模式和依存语法的方面提取和排序的新方法。同时,它能够按重要性,即相关性和频率对提取的方面进行排名。此外,我们发现在某些领域中有两种特殊的产品方面。一种是暗示观点的名词方面。另一个是资源术语。提出了新颖的提取算法以从意见文档中识别它们。在实体提取任务方面,它类似于经典的命名实体提取(NER)问题。但是,有很大的不同。在典型的观点挖掘应用中,用户经常希望找到关于某些竞争实体(例如竞争或相关产品)的观点。这意味着发现的实体必须具有相同的类型/类。基本上,这是一个集合扩展问题。为了解决这个问题,我们提出了两种集合扩展算法,用于在意见文档中提取实体。一种是基于正面和无标签(PU)学习模型。另一个基于贝叶斯集。我们还将讨论从集合中提取主题文档。意见挖掘系统首先对意见文档进行爬网和索引,然后再用于不同的特定任务。通常,文档分类不充分,因为人们不知道将来的任务是什么。通常,关键字搜索用于查找相关意见文档以进行分析。但是,以这种方式检索的文档可能具有较低的召回率和较低的精度。另一种方法是训练文档分类器。但是培训过程既费时又费力。我们提出了一种基于新的PU学习算法来解决此问题的无监督技术。

著录项

  • 作者

    Zhang, Lei.;

  • 作者单位

    University of Illinois at Chicago.;

  • 授予单位 University of Illinois at Chicago.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 121 p.
  • 总页数 121
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 遥感技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号