批标准化 tf.keras.layers.BatchNormalization 参数解析与应用分析

最新推荐文章于 2024-04-08 11:15:14 发布

原创

最新推荐文章于 2024-04-08 11:15:14 发布 · 2.1w 阅读

本文深入探讨TensorFlow中BatchNormalization层的工作原理，包括参数设定、变量类型与更新机制，以及在训练与测试阶段的不同表现。重点讲解了如何正确配置与应用BN层，避免常见陷阱。

Table of Contents

函数调用

设置training=None时可能存在的问题 :tf.keras.backend.learning_phase()的特点

批标准化函数产生的变量是可训练的吗？

在使用批标准化时，要保存所有变量，而不仅仅是可训练变量

应用分析

滑动平均的计算公式：

滑动平均计算节点在批标准化中的作用

滑动平均的计算时机与注意事项：添加依赖控制

如果没有添加依赖控制会怎样？

批标准化过程

说明：

1. 我用的tensorflow是1.14.0。 tensorflow的一些参数含义在1.*版本和2.*版本上是不同的。这点需要注意。以下面代码中的参数“trainable=True/False”为例，在1.*版本和2.*版本上是完全相反的含义。（劝大家早日逃离tensorflow）.

2. trainable参数：是在批标准化层的类对象参数。

training参数：是批标准化的类对象的调用函数call()的参数。

3. trainable参数：

上面提到，在tensorflow 2.0和tensorflow 1.*中，对于批标准化层的trainable参数的相同设置有不同的含义。下面第二个代码框内有介绍。

所应用的到底是哪种含义，建议直接去源码查看说明，我用的是tf 1.14.0, 在批标准化层的说明。

参数介绍

基类的定义如下：

class BatchNormalizationBase(Layer):
  def __init__(self,
               axis=-1,# 指向[NHWC]的channel维度，当数据shape为[NCHW]时，令axis=1
               momentum=0.99,# 计算均值与方差的滑动平均时使用的参数（滑动平均公式中的beta，不要与这里混淆）
               epsilon=1e-3,
               center=True,# bool变量，决定是否使用批标准化里的beta参数(是否进行平移)
               scale=True,# bool变量，决定是否使用批标准化里的gamma参数(是否进行缩放)
               beta_initializer='zeros',# 调用init_ops.zeros_initializer()，beta参数的0初始化，beta参数是平移参数
               gamma_initializer='ones',# 调用init_ops.ones_initializer()，gamma参数的1初始化,gamma参数是缩放参数
               moving_mean_initializer='zeros',# 均值的滑动平均值的初始化，初始均值为0
               moving_variance_initializer='ones',# 方差的滑动平均值的初始化，初始均值为1# 可见初始的均值与方差是标准正态分布的均值与方差
               beta_regularizer=None,# beta参数的正则化向，一般不用
               gamma_regularizer=None,# gamma 参数的正则化向，一般不用
               beta_constraint=None,# beta参数的约束项，一般不用
               gamma_constraint=None,# gamma 参数的约束项，一般不用
               renorm=False,
               renorm_clipping=None,
               renorm_momentum=0.99,
               fused=None,
               trainable=True,# 默认为True，这个我觉得就不要改了，没必要给自己找麻烦，
                              # 就是把我们标准化公式里面的参数添加到
                              # GraphKeys.TRAINABLE_VARIABLES这个集合里面去，
                              # 因为只有添加进去了，参数才能更新，毕竟γ和β是需要学习的参数。
                              # 但是，tf.keras.layers.BatchNormalization中并没有做到这一点，
                              # 所以需要手工执行这一操作。
               virtual_batch_size=None,
               adjustment=None,
               name=None,
               **kwargs):
    ########################
    ##只介绍参数，具体执行代码省略
    #####################

  def _get_training_value(self, training=None):
    #######
    ###该函数说明了training在不同取值时的处理，把输入的training参数转为bool变量输出，
    ###这里主要关注对training=None的处理
    #######
    if training is None:
      training = K.learning_phase() # K表示keras.backend,learning_phase()函数返回当前状态flag，是train还是test阶段，供keras使用
    if self._USE_V2_BEHAVIOR:
      if isinstance(training, int):
        training = bool(training)
      if base_layer_utils.is_in_keras_graph():
        training = math_ops.logical_and(training, self._get_trainable_var())
      else:
        training = math_ops.logical_and(training, self.trainable)
    return training

  def call(self, inputs,# 就是输入数据，默认shape=[NHWC],如果是其它shape，要对上面的axis值进行修改
         training=None  # 有三种选择：True,False，None，用于判断网络是处于训练阶段还是测试阶段。
                        # `training=True`: 网络处于训练阶段，The layer will normalize its inputs 
                        #    using the mean and variance of the current batch of inputs.
                        #  `training=False`: 网络处于测试阶段或inference阶段，The layer will normalize its inputs using 
                        #    the mean and variance of its moving statistics, learned during training.
                        # 即，training=True：使用当前批次的均值与方差进行标准化；training=False,使用滑动均值，滑动方差进行标准化。
                       
          ):
   
    training = self._get_training_value(training)

    ###
    ###只介绍参数，具体执行代码省略
    ###

关于trainable的设置，以下是keras的说明：

"""
class BatchNormalization(normalization.BatchNormalizationBase):

  __doc__ = normalization.replace_in_base_docstring([
      ('{
  
  {TRAINABLE_ATTRIBUTE_NOTE}}',
       '''
  **About setting `layer.trainable = False` on a `BatchNormalization layer:**
关于 BatchNormalization 层中 layer.trainable = False 的设置：

  The meaning of setting `layer.trainable = False` is to freeze the layer,
  i.e. its internal state will not change during training:
  its trainable weights will not be updated
  during `fit()` or `train_on_batch()`, and its state updates will not be run.
对于一个一般的层，设置layer.trainable = False表示冻结这一层的参数，使这一层的内部状态不随着训练过程改变，即这一层的可训练参数不被更新，也即，在`fit()` or `train_on_batch()`过程中，这一层的状态不会被更新。

  Usually, this does not necessarily mean that the layer is run in inference
  mode (which is normally controlled by the `training` argument that can
  be passed when calling a layer). "Frozen state" and "inference mode"
  are two separate concepts.
通常，设置layer.trainable = False并不一定意味着这一层处于inference状态（测试状态），（模型是否处于inference状态，通常调用该层的call函数时用一个叫training的参数控制。）所以，“冻结状态”和“推断模式”是两种不同的概念。

  However, in the case of the `BatchNormalization` layer, **setting
  `trainable = False` on the layer means that the layer will be
  subsequently run in inference mode** (meaning that it will use
  the moving mean and the moving variance to normalize the current batch,
  rather than using the mean and variance of the current batch).
但是，在BatchNormalization中，设置trainable = False 意味着这一层会以“推断模式”运行。
这就意味着，如果在训练过程中设置批标准化层的trainable = False，就意味着批标准化过程中会使用滑动均值与滑动方差来执行当前批次数据的批标准化，而不是使用当前批次的均值与方差。
----》个人理解：对于批标准化，我们希望的是，在训练过程中使用每个minibatch自己的均值与方差执行标准化，同时保持一个滑动均值与滑动方差在测试过程中使用。如果在训练过程中，设置trainable = False的话，会导致，在训练过程中，批标准化层就会使用滑动均值与方差进行批标准化。


  This behavior has been introduced in TensorFlow 2.0, in order
  to enable `layer.trainable = False` to produce the most commonly
  expected behavior in the convnet fine-tuning use case.
这一操作已经被引入到TensorFlow 2.0中，目的是使`layer.trainable = False`产生最期待的行为：以便在网络fine-tune中使用。
---》个人理解：在网络fine-tune中，我们希望冻结一些层的参数，仅仅训练个别层的参数。对于批标准化层来说，我们希望这一层在训练过程中仍旧使用已经训练好的滑动均值和滑动方差，而不是当前批次的均值和方差。

  Note that:
    - This behavior only occurs as of TensorFlow 2.0. In 1.*,
      setting `layer.trainable = False` would freeze the layer but would
      not switch it to inference mode.
注意：这一行为仅仅发生在TensorFlow 2.0上。在1.*版本上，设置标准化层的`layer.trainable = False`，仍旧只会冻结标准化层的gamma和beta,仍旧使用当前批次的均值和方差标准化。
--》个人理解：在1.*版本上，设置标准化层的`layer.trainable = False`，得到的操作是：
    1）标准化层的gamma和beta不被训练
    2）执行标准化时，使用的是当前批次的均值和方差，而不是滑动均值和滑动方差。
    3）滑动均值和滑动方差仍旧会被计算吗？这有待确定。
    - Setting `trainable` on an model containing other layers will
      recursively set the `trainable` value of all inner layers.
当给一整个model设置trainable参数时，相当于给其内部的每个层都设置了这一相同的参数。
    - If the value of the `trainable`
      attribute is changed after calling `compile()` on a model,
      the new value doesn't take effect for this model
      until `compile()` is called again.
如果，model在调用“compile()”时改变了trainable参数，新的trainable参数值并不影响这个model,直到再次调用“compile()”函数。
      ''')])
"""