首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Data Programming for Learning Discourse Structure
【24h】

Data Programming for Learning Discourse Structure

机译:学习话语结构的数据编程

获取原文

摘要

This paper investigates the advantages and limits of data programming for the task of learning discourse structure. The data programming paradigm implemented in the Snorkel framework allows a user to label training data using expert-composed heuristics, which are then transformed via the "generative step" into probability distributions of the class labels given the training candidates. These results are later generalized using a discriminative model. Snorkels attractive promise to create a large amount of annotated data from a smaller set of training data by unifying the output of a set of heuristics has yet to be used for computationally difficult tasks, such as that of discourse attachment, in which one must decide where a given discourse unit attaches to other units in a text in order to form a coherent discourse structure. Although approaching this problem using Snorkel requires significant modifications to the structure of the heuristics, we show that weak supervision methods can be more than competitive with classical supervised learning approaches to the attachment problem.
机译:本文研究了数据编程在学习话语结构任务方面的优势和局限性。在Snorkel框架中实现的数据编程范例允许用户使用专家组成的启发式方法来标记训练数据,然后通过“生成步骤”将其转换为给定训练候选者的类别标签的概率分布。这些结果随后使用判别模型进行概括。浮潜的诱人承诺,是通过统一一组启发式方法的输出,从较小的一组训练数据中创建大量带注释的数据,尚未用于计算难度较大的任务,例如话语附加的任务,在该任务中,用户必须决定在何处给定的话语单元在文本中附加到其他单元,以形成连贯的话语结构。尽管使用Snorkel解决此问题需要对启发式方法的结构进行重大修改,但我们表明,较弱的监督方法可能比依恋问题的经典监督学习方法更具竞争优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号