首页> 外文会议>Conference on empirical methods in natural language processing >Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data
【24h】

Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data

机译:培训和预测数据差异:文本分类与嘈杂,历史数据的挑战

获取原文

摘要

Industry datasets used for text classification are rarely created for that purpose. In most cases, the data and target predictions are a byproduct of accumulated historical data, typically fraught with noise, present in both the text-based document, as well as in the targeted labels. In this work, we address the question of how well performance metrics computed on noisy, historical data reflect the performance on the intended future machine learning model input. The results demonstrate the utility of dirty training datasets used to build prediction models for cleaner (and different) prediction inputs.
机译:为此目的很少创建用于文本分类的行业数据集。在大多数情况下,数据和目标预测是累积历史数据的副产品,通常用基于文本的文档以及目标标签中存在的噪声。在这项工作中,我们解决了在嘈杂计算的性能指标的问题问题的问题,历史数据反映了预期未来机器学习模型输入的性能。结果展示了脏训练数据集的效用,用于构建清洁(和不同)预测输入的预测模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号