首页> 外文期刊>Knowledge-Based Systems >Active learning with confidence-based answers for crowdsourcing labeling tasks
【24h】

Active learning with confidence-based answers for crowdsourcing labeling tasks

机译:主动学习,以基于信心的答案进行众包标签任务

获取原文
获取原文并翻译 | 示例
           

摘要

Collecting labels for data is important for many practical applications (e.g., data mining). However, this process can be expensive and time-consuming since it needs extensive efforts of domain experts. To decrease the cost, many recent works combine crowdsourcing, which outsources labeling tasks (usually in the form of questions) to a large group of non-expert workers, and active learning, which actively selects the best instances to be labeled, to acquire labeled datasets. However, for difficult tasks where workers are uncertain about their answers, asking for discrete labels might lead to poor performance due to the low-quality labels. In this paper, we design questions to get continuous worker responses which are more informative and contain workers' labels as well as their confidence. As crowd workers may make mistakes, multiple workers are hired to answer each question. Then, we propose a new aggregation method to integrate the responses. By considering workers' confidence information, the accuracy of integrated labels is improved. Furthermore, based on the new answers, we propose a novel active learning framework to iteratively select instances for "labeling". We define a score function for instance selection by combining the uncertainty derived from the classifier model and the uncertainty derived from the answer sets. The uncertainty derived from uncertain answers is more effective than that derived from labels. We also propose batch methods which select multiple instances at a time to further improve the efficiency of our approach. Experimental studies on both simulated and real data show that our methods are effective in increasing the labeling accuracy and achieve significantly better performance than existing methods.
机译:收集数据标签对于许多实际应用(例如数据挖掘)很重要。但是,由于需要领域专家的大量努力,因此此过程可能既昂贵又耗时。为了降低成本,最近的许多工作都结合了众包(将标签任务(通常以问题的形式)外包给一大批非专家工人)和主动学习(主动选择要标记的最佳实例)来获得标记。数据集。但是,对于工人不确定答案的艰巨任务,由于标签质量低下,要求使用离散标签可能会导致性能不佳。在本文中,我们设计问题以使工人能够获得连续的响应,这些响应会提供更多信息,并包含工人的标签及其信心。由于人群工人可能会犯错误,因此雇用了多名工人来回答每个问题。然后,我们提出了一种新的聚合方法来整合响应。通过考虑工人的信任度信息,可以提高集成标签的准确性。此外,基于新的答案,我们提出了一种新颖的主动学习框架,以迭代方式选择用于“标记”的实例。我们结合分类器模型的不确定性和答案集的不确定性,为实例选择定义了得分函数。来自不确定答案的不确定性比来自标签的不确定性更有效。我们还提出了一次选择多个实例的批处理方法,以进一步提高我们的方法的效率。对模拟和真实数据的实验研究表明,与现有方法相比,我们的方法可有效提高标记的准确性,并显着提高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号