Discrimination-aware network pruning for deep model compression

Jing Liu, Bohan Zhuang, Zhuangwei Zhuang, Yong Guo, Junzhou Huang, Jinhui Zhu, Mingkui Tan

Research output: Contribution to journalArticleResearchpeer-review

15 Citations (Scopus)

Abstract

We study network pruning which aims to remove redundant channels/kernels and accelerate the inference of deep networks. Existing pruning methods either train from scratch with sparsity constraints or minimize the reconstruction error between the feature maps of the pre-trained models and the compressed ones. Both strategies suffer from some limitations: the former kind is computationally expensive and difficult to converge, while the latter kind optimizes the reconstruction error but ignores the discriminative power of channels. In this paper, we propose a discrimination-aware channel pruning (DCP) method to choose the channels that actually contribute to the discriminative power. Based on DCP, we further propose several techniques to improve the optimization efficiency. Note that the parameters of a channel (3D tensor) may contain redundant kernels (each with a 2D matrix). To solve this issue, we propose a discrimination-aware kernel pruning (DKP) method to select the kernels with promising discriminative power. Experiments on image classification and face recognition demonstrate the effectiveness of our methods. For example, on ILSVRC-12, the resultant ResNet-50 with 30% reduction of channels even outperforms the baseline model by 0.36% on Top-1 accuracy. The pruned MobileNetV1 and MobileNetV2 achieve 1.93x and 1.42x inference acceleration on a mobile device, respectively, with negligible performance degradation.

Original languageEnglish
Pages (from-to)4035-4051
Number of pages17
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume44
Issue number8
DOIs
Publication statusPublished - 1 Aug 2022

Keywords

  • Acceleration
  • Adaptation models
  • Channel Pruning
  • Computational modeling
  • Deep Neural Networks
  • Kernel
  • Kernel Pruning
  • Network Compression
  • Quantization (signal)
  • Redundancy
  • Training

Cite this