The growing popularity of black box machine learning methods for medical image analysis makes their inter-pretability to a crucial task. To make a system, e.g. a trained neural network, trustworthy for a clinician, it needsto be able to explain its decisions and predictions. In this work, we tackle the problem of generating plausibleexplanations for the predictions of medical image classifiers, that are trained to differentiate between differenttypes of pathologies and healthy tissues. An intuitive solution to determine which image regions inuence thetrained classifier is to find out whether the classification results change when those regions are deleted. Thisidea can be formulated as a minimization problem and thus efficiently implemented. However, the meaning ofdeletion" of image regions, in our case pathologies in medical images, is not defined. We contribute by definingthe deletion of pathologies, as the replacement by their healthy looking equivalent generated using variationalautoencoders. The experiments with a classification neural network on OCT (Optical Coherence Tomography)images and brain lesion MRIs show that a meaningful replacement of deleted" image regions has significantimpact on the reliability of the generated explanations. The proposed deletion method is proven to be successfulsince our approach delivers the best results compared to four other established methods.
展开▼