Fine-Grained Visual Categorization (FGVC) aims at distinguishing subordinate-level categories with subtle interclass differences. Although previous research shows the impressive effectiveness of the recurrent multi-attention models and the second-order feature encoding, they often require an enormous amount of both computation and memory space, making them inadequate for mobile applications. This paper proposed a Category Attention Transfer CNN (CAT-CNN) to address the efficiency issue in solving FGVC problems. We transfer part attention knowledge from a very large-scale FGVC network to a small but efficient network to significantly improve its presentation ability. Using the proposed CAT-CNN, the accuracy of the efficient networks, such as ShuffleNet, MobilieNet, and EfficientNet, can be improved by up to 5.7 on the CUB-2011-20 0 dataset without increasing computation complexity or memory cost. Our experiments show that the proposed CAT-CNN can be applied to multiple structures to enhance their performance. With a single efficient network structure and single inference, the proposed CAT-MobileNetlarge-1.0 and the CAT-EfficientNet-b0 can achieve accuracies of 86.5 and 86.7, respectively, on the CUB2011-200 dataset, which is close to or better than the results from state-of-the-art methods using large scale networks and multiple inferences, and make FGVC feasible on mobile devices. (c) 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ )
展开▼