Interpreting Neural Network Hate Speech Classifiers

机译：解释神经网络讨厌语音分类器

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep neural networks have been applied to hate speech detection with apparent success, but they have limited practical applicability without transparency into the predictions they make. In this paper, we perform several experiments to visualize and understand a state-of-the-art neural network classifier for hate speech (Zhang et al., 2018). We adapt techniques from computer vision to visualize sensitive regions of the input stimuli and identify the features learned by individual neurons. We also introduce a method to discover the keywords that are most predictive of hate speech. Our analyses explain the aspects of neural networks that work well and point out areas for further improvement.

机译：深度神经网络已经应用于讨厌讲话检测，表观取得了明显的成功，但它们具有有限的实际适用性，而无需透明地进入他们所做的预测。在本文中，我们执行几个实验来可视化和理解仇恨语音的最先进的神经网络分类器（Zhang等，2018）。我们根据计算机视觉调整技术，以可视化输入刺激的敏感区域，并确定各个神经元学到的特征。我们还介绍了一种发现最讨论仇恨语音的关键字的方法。我们的分析解释了神经网络的各个方面，并指出了进一步改进的领域。

著录项

来源
《Conference on empirical methods in natural language processing》|2018年|xiii 170 p.|共7页
会议地点
作者
Cindy Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Effective hate-speech detection in Twitter data using recurrent neural networks [J] . Pitsilis Georgios K., Ramampiaro Heri, Langseth Helge Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2018,第12期

机译：使用经常性神经网络的推特数据中有效的仇恨语音检测
2. Hate Speech Classification in Indonesian Language Tweets by Using Convolutional Neural Network [J] . Dewa Ayu Nadia Taradhita, I Ketut Gede Darma Putra ITB Journal of Information and Communication Technology . 2021,第3期

机译：使用卷积神经网络讨厌印度尼西语语言推文的讲话分类
3. A deep neural network based multi-task learning approach to hate speech detection [J] . Kapil Prashant, Ekbal Asif Knowledge-Based Systems . 2020,第Deca27期

机译：基于深度神经网络的讨厌语音检测的多任务学习方法
4. Interpreting Neural Network Hate Speech Classifiers [C] . Cindy Wang Second workshop on abusive language online 2018 . 2018

机译：解释神经网络仇恨语音分类器
5. Modeling and learning in speech recognition: The relationship between stochastic pattern classifiers and neural networks. [D] . Niles, Leslie Thomas. 1991

机译：语音识别中的建模和学习：随机模式分类器与神经网络之间的关系。
6. Opening up the blackbox: an interpretable deep neural network-based classifier for cell-type specific enhancer predictions [O] . Seong Gon Kim, Nawanol Theera-Ampornpunt, Chih-Hao Fang, 2016

机译：打开黑匣子：基于可解释的深度神经网络的分类器用于细胞类型特定的增强子预测
7. Using Convolutional Neural Networks to Classify Hate-Speech [O] . Björn Gambäck, Utpal Kumar Sikdar 2017

机译：使用卷积神经网络来分类仇恨语音
8. Neural Network Classifiers for Speech Recognition [R] . Lippman, R. S. 1988

机译：用于语音识别的神经网络分类器

Interpreting Neural Network Hate Speech Classifiers

摘要

著录项

相似文献

相关主题

期刊订阅