首页> 外文会议>Fourth workshop on noisy user-generated text >Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data
【24h】

Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data

机译:训练和预测数据的差异:带有噪声,历史数据的文本分类的挑战

获取原文
获取原文并翻译 | 示例

摘要

Industry datasets used for text classification are rarely created for that purpose. In most cases, the data and target predictions are a byproduct of accumulated historical data, typically fraught with noise, present in both the text-based document, as well as in the targeted labels. In this work, we address the question of how well performance metrics computed on noisy, historical data reflect the performance on the intended future machine learning model input. The results demonstrate the utility of dirty training datasets used to build prediction models for cleaner (and different) prediction inputs.
机译:用于文本分类的行业数据集很少为此目的而创建。在大多数情况下,数据和目标预测是基于文本的文档以及目标标签中都存在的累积历史数据(通常充满噪音)的副产品。在这项工作中,我们解决了在嘈杂的历史数据上计算出的性能指标如何反映预期的未来机器学习模型输入中的性能的问题。结果证明了肮脏训练数据集可用于为更清洁(和不同)的预测输入建立预测模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号