1. 作用
在训练模式下,dropout层指的是在对全连接层中的数据进行指定概率p对神经元的权重置零;从而使得在每个批次中的数据不一致,这样可以简单的看作是很多个不同的模型进行训练,从而得到更鲁棒性的权重,达到多模型融合作用,提高模型的泛化性,降低模型的过拟合率;

1.2 公式
h ′ = { 0 p h 1 − p o t h e r s h'=\left\{ \begin{aligned} 0&&p\\ \frac{h}{1-p}& & others \\ \end{aligned} \right. h′=⎩⎪⎨⎪⎧01−phpothers
注: 由上述公式可以看出,我们需要分两步完成dropout:
(1)以概率p来对当前层权重置0
(2)将剩余的权重值乘以 1 / ( 1 − p ) 1/(1-p) 1/(1−p)
那我们为什么需要将剩余的值乘以 1 / ( 1 − p ) 1/(1-p) 1/(1−p)?为了保证样本的期望
E ( h ′ ) = 0 × p + h 1 − p × ( 1 − p ) = h ′ E(h')=0\times p+\frac{h}{1-p}\times (1-p)=h' E(h′)=0×p+1−ph×(1−p)=h′
import torch
# define dropout function
def dropout_test(x, dropout):
"""
:param x: input tensor
:param dropout: probability for dropout
:return: a tensor with masked by dropout
"""
# dropout must be between 0 and 1
assert 0 <= dropout <= 1
# if dropout is equal to 0;just return self_x
if dropout == 0:
return x
# if dropout is equal to 1: put all values to zeros in tensor x
if dropout == 1:
return torch.zeros_like(x)
# torch.rand is for return a tensor filled with values from uniform distribution [0,1)
# we compare the values with dropout,if values is greater than dropout ,return 1,else 0
mask = (torch.rand(x.shape) > dropout).float()
# mask times


3001

被折叠的 条评论
为什么被折叠?



