首页> 外文会议>International joint conference on natural language processing;Conference on empirical methods in natural language processing >Leveraging Just a Few Keywords for Fine-Grained Aspect Detection Through Weakly Supervised Co-Training
【24h】

Leveraging Just a Few Keywords for Fine-Grained Aspect Detection Through Weakly Supervised Co-Training

机译:通过弱监督的联合训练,仅利用几个关键词进行细粒度的方面检测

获取原文

摘要

User-generated reviews can be decomposed into fine-grained segments (e.g., sentences, clauses), each evaluating a different aspect of the principal entity (e.g.. price, quality, appearance). Automatically detecting these aspects can be useful for both users and downstream opinion mining applications. Current supervised approaches for learning aspect classifiers require many fine-grained aspect labels, which are labor-intensive to obtain. And, unfortunately, unsupervised topic models often fail to capture the aspects of interest. In this work, we consider weakly supervised approaches for training aspect classifiers that only require the user to provide a small set of seed words (i.e., weakly positive indicators) for the aspects of interest. First, we show that current weakly supervised approaches do not effectively leverage the predictive power of seed words for aspect detection. Next, we propose a student-teacher approach that effectively leverages seed words in a bag-of-words classifier (teacher); in turn, we use the teacher to train a second model (student) that is potentially more powerful (e.g., a neural network that uses pre-trained word embeddings). Finally, we show that iterative co-training can be used to cope with noisy seed words, leading to both improved teacher and student models. Our proposed approach consistently outperforms previous weakly supervised approaches (by 14.1 absolute F1 points on average) in six different domains of product reviews and six multilingual datasets of restaurant reviews.
机译:用户生成的评论可以分解为细粒度的细分(例如句子,从句),每个细分都评估主体的不同方面(例如价格,质量,外观)。自动检测这些方面对于用户和下游意见挖掘应用程序都可能有用。当前用于学习方面分类器的监督方法需要许多细粒度的方面标签,而这些标签很费力。而且,不幸的是,不受监督的主题模型通常无法捕获感兴趣的方面。在这项工作中,我们考虑了用于训练方面分类器的弱监督方法,该方法仅要求用户针对感兴趣的方面提供少量的种子词(即,弱积极指标)。首先,我们表明,当前的弱监督方法不能有效地利用种子词的预测能力来进行方面检测。接下来,我们提出一种学生-老师方法,该方法可以在词袋分类器(老师)中有效地利用种子词。反过来,我们使用老师来训练可能更强大的第二个模型(学生)(例如,使用预先训练的词嵌入的神经网络)。最后,我们证明了迭代式协同训练可用于处理嘈杂的种子词,从而改善了教师和学生的模型。在六个不同的产品评论领域和六个餐厅评论的多语言数据集中,我们提出的方法始终优于以前的弱监督方法(平均降低14.1绝对F1点)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号