首页> 外文会议>Fourth workshop on noisy user-generated text >Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data

【24h】

Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data

机译：训练和预测数据的差异：带有噪声，历史数据的文本分类的挑战

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Industry datasets used for text classification are rarely created for that purpose. In most cases, the data and target predictions are a byproduct of accumulated historical data, typically fraught with noise, present in both the text-based document, as well as in the targeted labels. In this work, we address the question of how well performance metrics computed on noisy, historical data reflect the performance on the intended future machine learning model input. The results demonstrate the utility of dirty training datasets used to build prediction models for cleaner (and different) prediction inputs.

机译：用于文本分类的行业数据集很少为此目的而创建。在大多数情况下，数据和目标预测是基于文本的文档以及目标标签中都存在的累积历史数据（通常充满噪音）的副产品。在这项工作中，我们解决了在嘈杂的历史数据上计算出的性能指标如何反映预期的未来机器学习模型输入中的性能的问题。结果证明了肮脏训练数据集可用于为更清洁（和不同）的预测输入建立预测模型。

著录项

来源
《Fourth workshop on noisy user-generated text》|2018年|104-109|共6页
会议地点 Brussels(BE)
作者
Emilia Apostolova; R. Andrew Kreek;
展开▼
作者单位

Language.ai Chicago, IL USA;

Allstate Insurance Company Seattle, WA USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Using the revised EM algorithm to remove noisy data for improving the one-against-the-rest method in binary text classification [J] . Hyoungdong Han, Youngjoong Ko, Jungyun Seo Information Processing & Management . 2007,第5期

机译：使用改进的EM算法去除噪声数据，以改进二进制文本分类中的“一对一休息”方法
2. Authorship Attribution of Short Historical Arabic Texts using Stylometric Features and a KNN Classifier with Limited Training Data [J] . Fatma Howedi, Masnizah Mohd, Zahra Aborawi Aborawi, Journal of computer sciences . 2020,第10期

机译：短期阿拉伯语文本的作者归属使用仪表特征和具有有限培训数据的KNN分类器
3. Authorship Attribution of Short Historical Arabic Texts using Stylometric Features and a KNN Classifier with Limited Training Data [J] . Fatma Howedi, Masnizah Mohd, Zahra Aborawi Aborawi, Journal of computer sciences . 2020,第10期

机译：短期阿拉伯语文本的作者归属使用仪表特征和KNN分类器，具有有限的培训数据
4. Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data [C] . Emilia Apostolova, R. Andrew Kreek Conference on empirical methods in natural language processing . 2018

机译：培训和预测数据差异：文本分类与嘈杂，历史数据的挑战
5. Synthesizing additional training data to increase the classification accuracy of visual data using feed-forward neural networks on small datasets. [D] . Qumsieh, Rafi. 2017

机译：在小型数据集上使用前馈神经网络合成其他训练数据，以提高视觉数据的分类准确性。
6. Event-Dataset: Temporal information retrieval and text classification dataset [O] . Shafiq Ur Rehman Khan, Muhammad Arshad Islam 2019

机译：事件数据集：时间信息检索和文本分类数据集
7. Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data [O] . R. Andrew Kreek, Emilia Apostolova 2018

机译：培训和预测数据差异：文本分类与嘈杂，历史数据的挑战

Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data

摘要

著录项

相似文献

相关主题

期刊订阅