首页> 外文学位 >Novel Techniques for Improving Classification Systems by Incorporating Experts.
【24h】

Novel Techniques for Improving Classification Systems by Incorporating Experts.

机译:通过整合专家改进分类系统的新技术。

获取原文
获取原文并翻译 | 示例

摘要

This manuscript presents novel techniques for incorporating the domain knowledge and wisdom of human "oracles'' into the data mining workflow. Tasked with building predictive models for a real-world, web-scale prediction task, we quickly realized that many data mining techniques, including state-of-the-art research, fail to perform as advertised. Assumptions that could be made in the lab might not hold in reality. To overcome these difficulties, we would need to employ human effort in clever ways, overcoming unexpected deficiencies when collecting data for model training, performing predictions, or evaluating the quality of predictions a model may make.;Leveraging human knowledge for data mining or machine learning tasks is by no means anything new. Typically, constructing and monitoring a predictive machine learning system requires labeled example data. While some situations may elicit labels naturally, in others human effort must be employed to "manually'' examine each instance considered, applying an appropriate label based on observations. These labeled instances are most frequently used during or prior to the training phase of the data mining process, generating the data that is considered during model induction. Gathering labels for selected examples, however, is not the only way human effort can be employed to aid the efficacy of a data mining system. Humans can go out and seek examples they believe will be useful for a model's training. Additionally, labeled examples can be gathered for a model deployed in production, generating performance estimates, and building a better understanding of how a model behaves. Finally, examples can be labeled as substitutes for a model's imperfect label predictions, applying human expertise at inference time.;In the following research, we present several deficiencies in the existing techniques for gathering training data for data mining systems, offering alternative techniques that we demonstrate to be much more effective. We also show problems that exist in traditional model evaluation, problems that are particularly acute in web-scale predictive tasks. We provide an alternative approach that uses a game-ified design to aid the task of evaluating a model. Finally, we present a novel situation for applying human resources to predictive inference, giving a utility-optimizing approach, and demonstrating that our approach is, in fact, also a good way of gathering additional training data for model improvement.;The techniques presented herein are proven not only through simulation in the laboratory setting, but in reality---these ideas were forged from the demands of production. Being employed in a production system validated these ideas far beyond what is typical for machine learning research. Still, to demonstrate that the ideas discussed here can generalize to a variety of tasks, we go on to support our claims with a variety of simulations.
机译:该手稿提出了将人的“ oracle”领域知识和智慧整合到数据挖掘工作流程中的新颖技术,通过为网络规模的现实预测任务建立预测模型,我们很快意识到许多数据挖掘技术,包括最先进的研究在内,都无法像广告中所说的那样进行。可能在实验室中进行的假设可能不切实际。要克服这些困难,我们需要以巧妙的方式进行人工操作,以克服无法预料的缺陷。收集数据以进行模型训练,执行预测或评估模型可能做出的预测质量;利用人类知识进行数据挖掘或机器学习任务绝不是什么新鲜事,通常,构建和监视预测性机器学习系统需要标记尽管某些情况可能会自然地引起标签,但在其他情况下,则必须通过人工来“手动”检查每个实例并根据观察结果贴上适当的标签。这些标记的实例在数据挖掘过程的训练阶段期间或之前最常用,从而生成在模型归纳过程中考虑的数据。但是,为选定示例收集标签并不是人为提高数据挖掘系统效率所必需的唯一方法。人类可以出去寻找他们认为对模型训练有用的例子。此外,可以为生产中部署的模型收集标记的示例,生成性能估计并更好地理解模型的行为。最后,可以将示例标记为模型的不完美标签预测的替代品,并在推断时应用人类专业知识。;在以下研究中,我们介绍了现有技术在为数据挖掘系统收集训练数据时存在的一些不足之处,并提供了替代技术证明更有效。我们还展示了传统模型评估中存在的问题,这些问题在网络规模的预测任务中尤为严重。我们提供了一种替代方法,该方法使用游戏化设计来协助评估模型。最后,我们提出了一种将人力资源应用于预测推理的新情况,给出了一种实用程序优化的方法,并证明了我们的方法实际上也是收集用于模型改进的额外训练数据的一种好方法。这些想法不仅在实验室环境中通过仿真得到了证明,而且在现实中也得到了证明-这些想法是根据生产需求提出的。在生产系统中受雇,这些想法得到了验证,远远超出了机器学习研究的典型范围。尽管如此,为了证明此处讨论的思想可以推广到各种任务,我们继续通过各种模拟来支持我们的主张。

著录项

  • 作者

    Attenberg, Joshua M.;

  • 作者单位

    Polytechnic Institute of New York University.;

  • 授予单位 Polytechnic Institute of New York University.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 150 p.
  • 总页数 150
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号