chain rule used in a single neuron:
绿箭头(used to calculate z):forward pass
红箭头(used to calculate gradients of weight matrices):backward pass

network architecture:
z = w1x
h = sigmoid(z)
y^ = w2h
E(loss) = 1/2||y^ - y||2

step1:
loss function 对 hidden layer-output layer weight matrix 的导数矩阵(the same size as the original weight matrix W2):

step2:
loss对h和对z的导数矩阵:

step3:
loss function 对 input layer-hidden layer weight matrix 的导数矩阵(the same size as the original weight matrix W1):

Properties we use in the derivation:




reference:
[1] https://www.bilibili.com/video/BV1h4411A7v4/?spm_id_from=333.788.videocard.3
[2] http://web.stanford.edu/class/cs224n/readings/cs224n-2019-notes03

1万+

被折叠的 条评论
为什么被折叠?



