Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs

机译：通过使用知识图形的组合，通过文本增强和文本生成提升文本分类性能。

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text classification models have been heavily utilized for a slew of interesting natural language processing problems. Like any other machine learning model, these classifiers are very dependent on the size and quality of the training dataset. Insufficient and unbalanced datasets will lead to poor performance. An interesting solution to poor datasets is to take advantage of the world knowledge in the form of knowledge graphs to improve our training data. In this paper, we use ConceptNet and Wikidata to improve sexist tweet classification by two methods (1) text augmentation and (2) text generation. In our text generation approach, we generate new tweets by replacing words using data acquired from ConceptNet relations in order to increase the size of our training set, this method is very helpful with frustratingly small datasets, preserves the label and increases diversity. In our text augmentation approach, the number of tweets remains the same but their words are augmented (concatenation) with words extracted from their ConceptNet relations and their description extracted from Wikidata. In our text augmentation approach, the number of tweets in each class remains the same but the range of each tweet increases. Our experiments show that our approach improves sexist tweet classification significantly in our entire machine learning models. Our approach can be readily applied to any other small dataset size like hate speech or abusive language and text classificatbn problem using any machine learning model.

机译：文本分类模型已经大量利用了有趣的自然语言处理问题的扭转。与任何其他机器学习模型一样，这些分类器非常依赖于训练数据集的大小和质量。不足和不平衡的数据集会导致性能不佳。对恶劣数据集的一个有趣的解决方案是以知识图表的形式利用世界知识来改善我们的培训数据。在本文中，我们使用ConceptNet和Wikidata通过两种方法（1）文本增强和（2）文本生成来改善性别宣布的Tweet分类。在我们的文本生成方法中，我们通过使用从ConceptNet关系中获取的数据替换单词来生成新的推文，以便增加我们的培训集的大小，这种方法非常有助于令人沮丧的小型数据集，保留标签并增加多样性。在我们的文本增强方法中，推文的数量保持不变，但它们的单词是从其概念关系中提取的单词增强（连接），并从Wikidata中提取的描述。在我们的文本增强方法中，每个类中的推文的数量保持不变，但每个推文的范围都会增加。我们的实验表明，我们的方法在我们的整个机器学习模型中显着提高了性别歧视曲线分类。我们的方法可以很容易地应用于任何其他小型数据集大小，如仇恨语音或滥用语言，并使用任何机器学习模型进行文本分类。

著录项

来源
《Conference on empirical methods in natural language processing》|2018年|xiii 170 p.|共8页
会议地点
作者
Sima Sharifirad; Borna Jafarpour; Stan Matwin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Knowledge-driven graph similarity for text classification [J] . Shanavas Niloofer, Wang Hui, Lin Zhiwei, International journal of machine learning and cybernetics . 2021,第4期

机译：文本分类的知识驱动的图形相似性
2. A knowledge graph-based content selection model for data-driven text generation [J] . Jun-Peng Gong, Juan Cao, Peng-Zhou Zhang International journal of reasoning-based intelligent systems . 2017,第3a4期

机译：基于知识图的内容选择模型，用于数据驱动的文本生成
3. Effects of screen type, Chinese typography, text/background color combination, speed, and jump length for VDT leading display on user's reading performance [J] . An-Hsiang Wang, Cheng-Hsun Chen International Journal of Industrial Ergonomics . 2003,第4期

机译：屏幕类型，中文字体，文本/背景颜色组合，速度和跳变长度对VDT领先显示效果的影响
4. Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs [C] . Sima Sharifirad, Borna Jafarpour, Stan Matwin Second workshop on abusive language online 2018 . 2018

机译：通过结合知识图的文本增强和文本生成，提高性别歧视推文上的文本分类性能
5. Analysing the effects of data augmentation and free parameters for text classification with recurrent convolutional neural networks. [D] . Quijas, Jonathan K. 2017

机译：使用递归卷积神经网络分析数据扩充和自由参数对文本分类的影响。
6. Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection [O] . Taxiarchis Botsis, Michael D Nguyen, Emily Jane Woo, 2011

机译：疫苗不良事件报告系统的文本挖掘：使用信息特征选择进行医学文本分类
7. Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs [O] . Sima Sharifirad, Borna Jafarpour, Stan Matwin 2018

机译：通过使用知识图形的组合，通过文本增强和文本生成提升文本分类性能。

Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs

摘要

著录项

相似文献

相关主题

期刊订阅