一、简介
torch.auotograd模块可实现任意标量值函数自动求导的类和函数。针对一个张量只需要设置参数requires_grad=True,通过相关计算即可输出其在传播过程中的梯度(导数)信息。
二、案例分析
分析1
在PyTorch中生成一个矩阵张量x,且y=sum(x2+2x+1)y = sum(x^2+2x+1)y=sum(x2+2x+1),计算出y在x上的导数,程序如下:
import torch
x = torch.tensor([[1.0, 2], [3, 4]], requires_grad=True)
y = torch.sum(x ** 2 + 2 * x + 1)
y.backward()
print(x.grad)
>>>tensor([[ 4., 6.],
[ 8., 10.]])
首先,生成[2×2][2\times 2][2×2]维度的张量xxx,并计算y=sum(x2+2x+1)y = sum(x^2+2x+1)y=sum(x2+2x+1),计算出的yyy是标量(scalar value)。
y=(x112+2x11+1)+(x132+2x13+1)+(x312+2x31+1)+(x22+2x22+1)y=(x_{11}^2+2x_{11}+1)+(x_{13}^2+2x_{13}+1)+(x_{31}^2+2x_{31}+1)+(x_{2}^2+2x_{22}+1)y=(x112+2x11+1)+(x132+2x13+1)+(x312+2x31+1)+(x22+2x22+1)
此时使用y.backward()即可自动计算出yyy在xxx每个元素上的导数。即:
[∂y∂x11∂y∂x12∂y∂x21∂y∂x22]=[2x11+22x12+22x21+22x22+2]
\left[
\begin{matrix}
\frac{\partial y}{\partial x_{11}} &\frac{\partial y}{\partial x_{12}}\\
\frac{\partial y}{\partial x_{21}} &\frac{\partial y}{\partial x_{22}}\\
\end{matrix}
\right]=
\left[
\begin{matrix}
2x_{11} + 2&2x_{12} + 2\\
2x_{21} + 2&2x_{22} + 2\\
\end{matrix}
\right]
[∂x11∂y∂x21∂y∂x12∂y∂x22∂y]=[2x11+22x21+22x12+22x22+2]
通过计算即可得出如下结果:
[[ 4., 6.],
[ 8., 10.]]
分析2
为什么要使用sum()呢?可不可以去掉sum()呢?像这样:
y=x2+2x+1y = x^2+2x+1y=x2+2x+1
尝试一下:
import torch
x = torch.tensor([[1.0, 2], [3, 4]], requires_grad=True)
y = x ** 2 + 2 * x + 1
y.backward()
print(x.grad)
此时会报错,其含义大概就是.backward()只能对标量使用。

经过查阅相关内容,对代码进行改进:
import torch
x = torch.tensor([[1.0, 2], [3, 4]], requires_grad=True)
y = x ** 2 + 2 * x + 1
y.backward(gradient=torch.tensor([[1.0, 1], [1, 1]]))
print(x.grad)
>>>tensor([[ 4., 6.],
[ 8., 10.]])
在y.backward()内添加一个[2×2][2\times 2][2×2]的单位向量,此时代码可成功运行。可以这么理解:
[y11y12y21y22]=[(x112+2x11+1)(x122+2x12+1)(x212+2x21+1)(x222+2x22+1)]
\left[
\begin{matrix}
y_{11} &y_{12}\\
y_{21}&y_{22}\\
\end{matrix}
\right]=
\left[
\begin{matrix}
(x_{11}^2+2x_{11}+1)&(x_{12}^2+2x_{12}+1)\\
(x_{21}^2+2x_{21}+1)&(x_{22}^2+2x_{22}+1)\\
\end{matrix}
\right]
[y11y21y12y22]=[(x112+2x11+1)(x212+2x21+1)(x122+2x12+1)(x222+2x22+1)]
y已经不再是一个标量,而是一个矩阵,此时的求导是对应元素分别求导。
[∂y11∂x11∂y12∂x12∂y21∂x21∂y22∂x22]=[2x11+22x12+22x21+22x22+2]
\left[
\begin{matrix}
\frac{\partial y_{11}}{\partial x_{11}} &\frac{\partial y_{12}}{\partial x_{12}}\\
\frac{\partial y_{21}}{\partial x_{21}} &\frac{\partial y_{22}}{\partial x_{22}}\\
\end{matrix}
\right]=
\left[
\begin{matrix}
2x_{11} + 2&2x_{12} + 2\\
2x_{21} + 2&2x_{22} + 2\\
\end{matrix}
\right]
[∂x11∂y11∂x21∂y21∂x12∂y12∂x22∂y22]=[2x11+22x21+22x12+22x22+2]
如果将单位矩阵乘2:y.backward(gradient=torch.tensor([[2.0, 2], [2, 2]]))
import torch
x = torch.tensor([[1.0, 2], [3, 4]], requires_grad=True)
y = x ** 2 + 2 * x + 1
y.backward(gradient=torch.tensor([[2.0, 2], [2, 2]]))
print(x.grad)
>>>tensor([[ 8., 12.],
[16., 20.]])
相当于:
[y11y12y21y22]=[2(x112+2x11+1)2(x122+2x12+1)2(x212+2x21+1)2(x222+2x22+1)]
\left[
\begin{matrix}
y_{11} &y_{12}\\
y_{21}&y_{22}\\
\end{matrix}
\right]=
\left[
\begin{matrix}
2 (x_{11}^2+2x_{11}+1)&2(x_{12}^2+2x_{12}+1)\\
2(x_{21}^2+2x_{21}+1)&2(x_{22}^2+2x_{22}+1)\\
\end{matrix}
\right]
[y11y21y12y22]=[2(x112+2x11+1)2(x212+2x21+1)2(x122+2x12+1)2(x222+2x22+1)]
所以其导数也将是原值的二倍。
[∂y11∂x11∂y12∂x12∂y21∂x21∂y22∂x22]=[4x11+44x12+44x21+44x22+4]
\left[
\begin{matrix}
\frac{\partial y_{11}}{\partial x_{11}} &\frac{\partial y_{12}}{\partial x_{12}}\\
\frac{\partial y_{21}}{\partial x_{21}} &\frac{\partial y_{22}}{\partial x_{22}}\\
\end{matrix}
\right]=
\left[
\begin{matrix}
4x_{11} + 4&4x_{12} + 4\\
4x_{21} + 4&4x_{22} + 4\\
\end{matrix}
\right]
[∂x11∂y11∂x21∂y21∂x12∂y12∂x22∂y22]=[4x11+44x21+44x12+44x22+4]
分析3
想象力丰富一点,将前面两个案例相结合。
import torch
x = torch.tensor([[1.0, 2], [3, 4]], requires_grad=True)
y = x ** 2 + 2 * x + 1
z = torch.sum(x ** 2 + 2 * x + 1)
y.backward(gradient=torch.tensor([[1.0, 1], [1, 1]]))
z.backward()
print(x.grad)
>>>tensor([[ 8., 12.],
[16., 20.]])
以此对y、z进行反向传播求导,其输出结果是两次求导的和。求导公式应为:
[∂y∂x11+∂y11∂x11∂y∂x12+∂y12∂x12∂y∂x21+∂y21∂x21∂y∂x22+∂y22∂x22]=[2x11+22x12+22x21+22x22+2]+[2x11+22x12+22x21+22x22+2]
\left[
\begin{matrix}
\frac{\partial y}{\partial x_{11}} + \frac{\partial y_{11}}{\partial x_{11}} &\frac{\partial y}{\partial x_{12}} + \frac{\partial y_{12}}{\partial x_{12}}\\
\frac{\partial y}{\partial x_{21}} + \frac{\partial y_{21}}{\partial x_{21}} &\frac{\partial y}{\partial x_{22}} + \frac{\partial y_{22}}{\partial x_{22}}\\
\end{matrix}
\right]=
\left[
\begin{matrix}
2x_{11} + 2&2x_{12} + 2\\
2x_{21} + 2&2x_{22} + 2\\
\end{matrix}
\right]+
\left[
\begin{matrix}
2x_{11} + 2&2x_{12} + 2\\
2x_{21} + 2&2x_{22} + 2\\
\end{matrix}
\right]
[∂x11∂y+∂x11∂y11∂x21∂y+∂x21∂y21∂x12∂y+∂x12∂y12∂x22∂y+∂x22∂y22]=[2x11+22x21+22x12+22x22+2]+[2x11+22x21+22x12+22x22+2]
本文详细介绍了PyTorch中autograd模块的应用,通过具体案例解析了如何利用该模块进行自动求导,包括如何计算张量在传播过程中的梯度信息,并探讨了不同操作对求导结果的影响。

5635

被折叠的 条评论
为什么被折叠?



