首页> 外文会议>Workshop on biomedical natural language processing 2015 >Making the most of limited training data using distant supervision
【24h】

Making the most of limited training data using distant supervision

机译:通过远程监督充分利用有限的培训数据

获取原文
获取原文并翻译 | 示例

摘要

Automatic recognition of relationships between key entities in text is an important problem which has many applications. Supervised machine learning techniques have proved to be the most effective approach to this problem. However, they require labelled training data which may not be available in sufficient quantity (or at all) and is expensive to produce. This paper proposes a technique that can be applied when only limited training data is available. The approach uses a form of distant supervision but does not require an external knowledge base. Instead, it uses information from the training set to acquire new labelled data and combines it with manually labelled data. The approach was tested on an adverse drug data set using a limited amount of manually labelled training data and shown to outperform a supervised approach.
机译:文本中关键实体之间关系的自动识别是一个重要的问题,具有许多应用。监督机器学习技术已被证明是解决此问题的最有效方法。但是,他们需要标记的训练数据,这些数据可能没有足够的数量(或根本没有)并且生产成本很高。本文提出了一种仅在有限的训练数据可用时可以应用的技术。该方法采用了远程监管的形式,但不需要外部知识库。相反,它使用训练集中的信息来获取新的标记数据,并将其与手动标记的数据组合。使用有限数量的手动标记训练数据对不良药物数据集进行了测试,结果表明该方法优于监督方法。

著录项

  • 来源
  • 会议地点 Beijing(CA)
  • 作者

    Roland Roller; Mark Stevenson;

  • 作者单位

    Department of Computer Science University of Sheffield Regent Court, 211 Portobello S1 4DP Sheffield, England;

    Department of Computer Science University of Sheffield Regent Court, 211 Portobello S1 4DP Sheffield, England;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号