全连接与卷积神经网络反向传播公式推导
全连接网络反向传播公式
BP四项基本原则:
δi(L)=▽yiCost⋅σ′(logiti(L))δi(l)=∑jδj(l+1)wji(l+1)σ′(logiti(l))∂Cost∂biasi(l)=δi(l)∂Cost∂wij(l)=δi(l)hj(l−1) \begin{aligned} \delta_i^{(L)} &= \bigtriangledown_{y_i} Cost \cdot \sigma'(logit_i^{(L)}) \\ \delta_i^{(l)} &= \sum_j \delta_j^{(l+1)} w_{ji}^{(l+1)} \sigma'(logit_i^{(l)}) \\ \frac{\partial Cost}{\partial bias_i^{(l)}} &= \delta_i^{(l)} \\ \frac{\partial Cost}{\partial w_{ij}^{(l)}} &= \delta_i^{(l)} h_j^{(l-1)} \end{aligned} δi(L)δi(l)∂biasi(l)∂Cost∂wij(l)∂Cost=▽yiCost⋅σ′(logiti(L))=j∑δj(l+1)wji(l+1)σ′(logiti(l))=δi(l)=δi(l)hj(l−1)
其中,(l)(l)(l)表示第lll层,一共有L层,i,ji,ji,j表示当前层神经元的序号。
反向传播公式的目的主要是得到:∂Cost∂biasi(l)\frac{\partial Cost}{\partial bias_i^{(l)}}∂biasi(l)∂Cost和∂Cost∂wij(l)\frac{\partial Cost}{\partial w_{ij}^{(l)}}∂wij(l)∂Cost。
在推导的过程中
∂Cost∂biasi(l)=∂Cost∂logiti(l)⋅∂logiti(l)∂biasi(l)∂Cost∂wij(l)=∂Cost∂logiti(l)⋅∂logiti(l)∂wij(l) \begin{aligned} \frac{\partial Cost}{\partial bias_i^{(l)}} &= \frac{\partial Cost}{\partial logit_i^{(l)}} \cdot \frac{\partial logit_i^{(l)}}{\partial bias_i^{(l)}} \\ \frac{\partial Cost}{\partial w_{ij}^{(l)}} &= \frac{\partial Cost}{\partial logit_i^{(l)}} \cdot \frac{\partial logit_i^{(l)}}{\partial w_{ij}^{(l)}} \end{aligned} ∂biasi(l)∂Cost∂wij(l)∂Cost=∂logiti(l)∂Cost⋅∂biasi(l)∂logiti(l)=∂logiti(l)∂Cost⋅∂wij(l)∂logiti(l)
会发现都要用到∂Cost∂logiti(l)\frac{\partial Cost}{\partial logit_i^{(l)}}∂logiti(l)∂Cost。
而
logiti(l)=wij(l)hj(l)+∑k≠jwik(l)hk(l)+biasi(l) logit_i^{(l)} = w_{ij}^{(l)} h_j^{(l)} + \sum_{k\ne j} w_{ik}^{(l)} h_{k}^{(l)} + bias_i^{(l)} logiti(l)=wij(l)hj(l)+k=j∑wik(l)hk(l)+biasi(l)
所以
∂logiti(l)∂biasi(l)=1∂logiti(l)∂wij(l)=hj(l) \begin{aligned} \frac{\partial logit_i^{(l)}}{\partial bias_i^{(l)}} &= 1 \\ \frac{\partial logit_i^{(l)}}{\partial w_{ij}^{(l)}} &= h_j^{(l)} \end{aligned} ∂biasi(l)∂logiti(l)∂wij(l)∂logiti(l)=1=hj(l)
那接下来的问题就只有求∂Cost∂logiti(l)\frac{\partial Cost}{\partial logit_i^{(l)}}∂logiti(l)∂Cost了,求它可以用递推法:
为公式看起来简洁,我们把∂Cost∂logiti(l)\frac{\partial Cost}{\partial logit_i^{(l)}}∂logiti(l)∂Cost记为δi(l)\delta_i^{(l)}δi(l),那么
δi(l)=∂Cost∂logiti(l)=∑j∂Cost∂logitj(l+1)⋅∂logitj(l+1)∂logiti(l)=∑jδj(l+1)⋅∂logitj(l+1)∂logiti(l) \delta_i^{(l)} = \frac{\partial Cost}{\partial logit_i^{(l)}} = \sum_j \frac{\partial Cost}{\partial logit_j^{(l+1)}} \cdot \frac{\partial logit_j^{(l+1)}}{\partial logit_i^{(l)}} = \sum_j \delta_j^{(l+1)} \cdot \frac{\partial logit_j^{(l+1)}}{\partial logit_i^{(l)}} δi(l)=∂logiti(l)∂Cost=j∑∂logitj(l+1)∂Cost⋅∂logiti(l)

本文详细介绍了全连接与卷积神经网络的反向传播公式,通过推导展示了BP算法中关键的梯度计算过程,包括全连接网络的δ递推公式和矩阵形式,并简要提到了卷积网络的反向传播理论。

9678

被折叠的 条评论
为什么被折叠?



