模型压缩之剪枝技术-CSDN博客

本文深入探讨模型压缩中的剪枝技术，区分结构化剪枝与非结构化剪枝的不同，介绍微软nni、IntelAILab及百度paddle等框架中剪枝算法的集成与应用。同时，详细解析tensorflow中剪枝的实现过程与原理。

模型压缩又可分很多方法，如剪枝（Pruning）、量化（Quantization）、低秩分解（Low-rank factorization）、知识蒸馏（Knowledge distillation）。今天主要介绍pruning方法，量化介绍可以参考我的另一篇关于量化的博文介绍。

从network pruning的粒度来说，可以分为结构化剪枝（Structured pruning）和非结构化剪枝（Unstructured pruning）两类。早期的一些方法是基于非结构化的，它裁剪的粒度为单个神经元。如果对kernel进行非结构化剪枝，则得到的kernel是稀疏的，即中间有很多元素为0的矩阵。除非下层的硬件和计算库对其有比较好的支持，pruning后版本很难获得实质的性能提升。稀疏矩阵无法利用现有成熟的BLAS库获得额外性能收益。因此，这几年的研究很多是集中在structured pruning上。Structured pruning又可进一步细分：如可以是channel-wise的，也可以是filter-wise的，还可以是在shape-wise的

difference between unstructured pruning and structured pruning:

Structured pruning: the dimensions of the weight tensors are reduced by removing entire rows/columns of the tensors. This translates into removing neurons with all their incoming and outgoing connections (in dense layers) or entire convolutional filters (in convolutional layers).
Unstructured pruning: individual weights can be "removed" (zeroed-out) without constraints of the shape of the final tensor. This translates into removing individual connections between neurons (in dense layers) or removing individual weights of the convolutional filters (in convolutional layers). Notice that the resulting weight tensors can be sparse but maintain their original shape.

非结构化剪枝是把每个单一的权重设为0(并没有减少参数量，只是部分权重被置为零)，而结构化剪枝（也可理解为filter pruning）是把整列和整行的权重移除掉(如FPGM prune)

目前，发现微软的nni框架在剪枝这块有一定的优势，集成了许多的剪枝模型，包括结构化的剪枝和非结构化的剪枝，文档链接：https://nni.readthedocs.io/en/stable/Compression/Pruner.html#包含以下剪枝算法：

Filter Pruning

Intel AILab的github也开源了不少的剪枝、量化相关算法，基于pytorch实现的：https://github.com/IntelLabs/distiller/tree/master/distiller

百度的paddle框架也集成了部分剪枝算法，FPGM就被集成了进去，同时也被应用到了paddleOCR框架里面，对模型的压缩和性能提升非常明显，检测+识别+方向模型一共3.5M

tensorflow目前没有支持结构化剪枝，也没有找到相关实现代码，官方提供的是基于非结构化的剪枝，也就是基于幅值的权重剪枝，反而pytorch开源的各种剪枝算法实现的比较好，目前论文中的最新剪枝算法绝大多数都是基于pytorch实现的，

下面介绍tensorflow权重剪枝的基本过程：

剪枝主要用于压缩模型大小和性能的提升，剪枝的基本原理是在训练时插入了mask掩码，用来标记哪些权重可以被剪掉，剪枝的tensorflow slim模块实现剪枝，API如下

参考：https://github.com/tensorflow/tensorflow/tree/r1.13/tensorflow/contrib/model_pruning

剪枝原理主要是增加一个形状和weight的shape一样的mask、threshold来进行剪枝，权重大于threshold的值被置为1，否则为0，mask决定了哪些layer的weight值进行前向传播，反向传播时哪些layer的weight进行梯度更新，而那些mask为0的layer的weight是不进行梯度更新的。

conv = tf.nn.conv2d(images, pruning.apply_mask(weights), stride, padding)

1. 训练剪枝模型

tf.app.flags.DEFINE_string( 'pruning_hparams','begin_pruning_step=0,end_pruning_step=-1,threshold_decay=0.9,pruning_frequency=10,initial_sparsity=0.0,target_sparsity=0.5,sparsity_function_begin_step=0,sparsity_function_end_step=1000,sparsity_function_exponent=3',"Comma separated list of pruning-related hyperparameters")

with tf.graph.as_default():

  # Create global step variable
  global_step = tf.train.get_or_create_global_step()

  # Parse pruning hyperparameters
  pruning_hparams = pruning.get_pruning_hparams().parse(FLAGS.pruning_hparams)

  # Create a pruning object using the pruning specification
  p = pruning.Pruning(pruning_hparams, global_step=global_step)

  # Add conditional mask update op. Executing this op will update all
  # the masks in the graph if the current global step is in the range
  # [begin_pruning_step, end_pruning_step] as specified by the pruning spec
  mask_update_op = p.conditional_mask_update_op()

  # Add summaries to keep track of the sparsity in different layers during training
  p.add_pruning_summaries()

  with tf.train.MonitoredTrainingSession(...) as mon_sess:
    # Run the usual training op in the tf session
    mon_sess.run(train_op)

    # Update the masks by running the mask_update_op
    mon_sess.run(mask_update_op)

2. 去除剪枝的mask

训练剪枝模型时，插入了mask op，生成pb模型时需要将这些op去除，利用tensorflow源码进行编译，然后去除

$ bazel build -c opt contrib/model_pruning:strip_pruning_vars
$ bazel-bin/contrib/model_pruning/strip_pruning_vars --checkpoint_dir=/path/to/checkpoints/ --output_node_names=graph_node1,graph_node2 --output_dir=/tmp --filename=pruning_stripped.pb

发现剪枝后大模型和剪枝前的模型大小并没有变化，因为是非结构化的剪枝，并不会改变卷积的shape，所以参数量并没有减少，只是部分权重被剪掉了，也就是被置零了，这也是剪枝后模型的大小并没有任何变化，

那这是不是代表剪枝没有用了呢，其实并不是，如果使用zip进行压缩后发现，压缩包的大小是不一样的，这与你剪枝的稀疏度有关，稀疏的越厉害，压缩率越高，当然会带来精度上的损失。

参考：https://www.voorp.com/a/TensorFlow%E6%A8%A1%E5%9E%8B%E5%89%AA%E6%9E%9DLaiCheng%E7%9A%84%E5%8D%9A%E5%AE%A2CSDN%E5%8D%9A%E5%AE%A2