2.4.3. 梯度
我们可以连接一个多元函数对其所有变量的偏导数,以得到该函数的梯度(gradient)向量。 具体而言,设函数f:Rn→Rf:\mathbb{R}^{n}\to\mathbb{R}f:Rn→R的输入是一个nnn维向量x⃗=[x1x2⋅⋅⋅xn]\vec x=\begin{bmatrix} x_1\\x_2\\···\\x_n\end{bmatrix}x=
x1x2⋅⋅⋅xn
,输出是一个标量。 函数f(x⃗)f(\vec x)f(x)相对于x⃗\vec xx的梯度是一个包含nnn个偏导数的向量:
∇x⃗f(x⃗)=[∂f(x⃗)∂x1∂f(x⃗)∂x2⋅⋅⋅∂f(x⃗)∂xn]\nabla_{\vec x} f(\vec x) = \begin{bmatrix}\frac{\partial f(\vec x)}{\partial x_1}\\\frac{\partial f(\vec x)}{\partial x_2}\\···\\ \frac{\partial f(\vec x)}{\partial x_n}\end{bmatrix}∇xf(x)=
∂x1∂f(x)∂x2∂f(x)⋅⋅⋅∂xn∂f(x)
其中∇x⃗f(x⃗)\nabla_{\vec x} f(\vec x)∇xf(x)通常在没有歧义时被∇f(x⃗)\nabla f(\vec x)∇f(x)取代。
假设x⃗\vec xx为nnn维向量,在微分多元函数时经常使用以下规则:
一、对于所有A∈Rm×nA \in \mathbb{R^{m\times n}}A∈Rm×n,都有∇x⃗Ax⃗=A⊤\nabla_{\vec x} A\vec x = A^\top∇xAx=A⊤;
证明:设A(m,n)A_{(m,n)}A(m,n) =[a1,1a1,2⋅⋅⋅a1,na2,1a2,2⋅⋅⋅a2,n⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅am,1am,2⋅⋅⋅am,n]\begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,n} \\ a_{2,1}&a_{2,2}&···&a_{2,n} \\ ··· & ··· & ··· & ··· \\ a_{m,1} & a_{m,2} &···&a_{m,n} \end{bmatrix}
a1,1a2,1⋅⋅⋅am,1a1,2a2,2⋅⋅⋅am,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,na2,n⋅⋅⋅am,n
,
则Ax⃗(m,1)A\vec x_{(m,1)}Ax(m,1) =[a1,1x1+a1,2x2+⋅⋅⋅+a1,nxna2,1x1+a2,2x2+⋅⋅⋅+a2,nxn⋅⋅⋅am,1x1+am,2x2+⋅⋅⋅+am,nxn]\begin{bmatrix} a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n \\ a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n \\ ··· \\ a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n \end{bmatrix}
a1,1x1+a1,2x2+⋅⋅⋅+a1,nxna2,1x1+a2,2x2+⋅⋅⋅+a2,nxn⋅⋅⋅am,1x1+am,2x2+⋅⋅⋅+am,nxn
,
∇x⃗Ax⃗\nabla_{\vec x}A\vec x∇xAx=[∂Ax⃗∂x1∂Ax⃗∂x2⋅⋅⋅∂Ax⃗∂xn]\begin{bmatrix}\frac{\partial A\vec x}{\partial x_1}\\\frac{\partial A\vec x}{\partial x_2}\\···\\ \frac{\partial A\vec x}{\partial x_n}\end{bmatrix}
∂x1∂Ax∂x2∂Ax⋅⋅⋅∂xn∂Ax
=[∂a1,1x1+a1,2x2+⋅⋅⋅+a1,nxn∂x1∂a2,1x1+a2,2x2+⋅⋅⋅+a2,nxn∂x1⋅⋅⋅∂am,1x1+am,2x2+⋅⋅⋅+am,nxn∂x1∂a1,1x1+a1,2x2+⋅⋅⋅+a1,nxn∂x2∂a2,1x1+a2,2x2+⋅⋅⋅+a2,nxn∂x2⋅⋅⋅∂am,1x1+am,2x2+⋅⋅⋅+am,nxn∂x2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅∂a1,1x1+a1,2x2+⋅⋅⋅+a1,nxn∂xn∂a2,1x1+a2,2x2+⋅⋅⋅+a2,nxn∂xn⋅⋅⋅∂am,1x1+am,2x2+⋅⋅⋅+am,nxn∂xn]\begin{bmatrix}\frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_1}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_1}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_1}\\ \frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_2}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_2}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_2}\\ ···&···&···&···\\ \frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_n}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_n}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_n}\end{bmatrix}
∂x1∂a1,1x1+a1,2x2+⋅⋅⋅+a1,nxn∂x2∂a1,1x1+a<


7153

被折叠的 条评论
为什么被折叠?



