两个多元高斯分布的KL散度
高斯分布,是定义在 RnR^nRn 上的连续型概率分布,概率密度函数为:
p(x)=1(2π)ndet(Σ)exp{−12(x−μ)TΣ−1(x−μ)}p(x)=\frac{1}{\sqrt{(2\pi)^n \det(\Sigma)}}exp\left\{ -\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)\right\} p(x)=(2π)ndet(Σ)1exp{−21(x−μ)TΣ−1(x−μ)}
这里 x,μ∈Rnx,\mu\in R^nx,μ∈Rn,Σ∈Rn×n\Sigma\in R^{n\times n}Σ∈Rn×n 是协方差矩阵,它要求是正定对称的。当 μ=0,Σ=I\mu=0,\Sigma=Iμ=0,Σ=I 时,即为标准正态分布。
正定对称:
Σ\SigmaΣ 是一个正定对称矩阵,那么:
(1)对称性:Σ=ΣT\Sigma=\Sigma^TΣ=ΣT
(2)正定性:对任意非零 ξ∈Rn\xi\in R^nξ∈Rn,有 ξTΣξ>0\xi^T\Sigma\xi >0ξTΣξ>0
正定矩阵的逆也是正定矩阵。两个正定矩阵的和也是正定矩阵。
正态分布的一些性质:
- Ex[x]=μE_x[x]=\muEx[x]=μ
- $E_x[(x-\mu)(x-\mu)^T]=\Sigma $
- $E_x[xxT]=\mu\muT+E_x[(x-\mu)(x-\mu)T]=\mu\muT+\Sigma $
- 熵:
H=Ex[−logp(x)]=n2(1+log2π)+12logdet(Σ)\mathcal{H}=E_x[-\log p(x)]=\frac{n}{2}(1+\log 2\pi)+\frac{1}{2}\log \det (\Sigma) H=Ex[−logp(x)]=2n(1+log2π)+21logdet(Σ)
KL散度
对于 p(x)=N(μp,Σp)p(x)=\mathcal{N}(\mu_p,\Sigma_p)p(x)=N(μp,Σp),q(x)=N(μq,Σq)q(x)=\mathcal{N}(\mu_q,\Sigma_q)q(x)=N(μq,Σq)
计算结果:
KL(p(x)∣∣q(x))=12[(μp−μq)TΣq−1(μp−μq)−logdet(Σq−1Σp)+Tr(Σq−1Σp)−n]KL(p(x)||q(x))=\frac{1}{2}\left[(\mu_p-\mu_q)^T\Sigma_q^{-1}(\mu_p-\mu_q)-\log \det(\Sigma_q^{-1}\Sigma_p)+Tr(\Sigma_q^{-1}\Sigma_p)-n \right] KL(p(x)∣∣q(x))=21[(μp−μq)TΣq−1(μp−μq)−logdet(Σq−1Σp)+Tr(Σq−1Σp)−n]
特别地,当 qqq 是标准正态分布时,结果简化为:
KL(p(x)∣∣q(x))=12[∣∣μp∣∣2+Tr(Σp)−logdet(Σp)−n]KL(p(x)||q(x))=\frac{1}{2}\left[||\mu_p||^2+Tr(\Sigma_p)-\log \det (\Sigma_p)-n \right] KL(p(x)∣∣q(x))=21[∣∣μp∣∣2+Tr(Σp)−logdet(Σp)−n]
推导过程:
KL(p(x)∣∣q(x))=Ex∼p(x)[logp(x)q(x)]=Ex∼p(x)[logp(x)]+Ex∼p(x)[−logq(x)]KL(p(x)||q(x))=E_{x\sim p(x)}\left[\log\frac{p(x)}{q(x)}\right]=E_{x\sim p(x)}[\log p(x)]+E_{x\sim p(x)}[-\log q(x)] KL(p(x)∣∣q(x))=Ex∼p(x)[logq(x)p(x)]=Ex∼p(x)[logp(x)]+Ex∼p(x)[−logq(x)]
先计算 Ex∼p(x)[−logq(x)]E_{x\sim p(x)}[-\log q(x)]Ex∼p(x)[−logq(x)]:
Ex∼p(x)[−logq(x)]=Ex∼p(x)[n2log(2π)+12logdet(Σq)+12(x−μq)TΣq−1(x−μq)]=n2log(2π)+12logdet(Σq)+12Ex∼p(x)[(x−μq)TΣq−1(x−μq)]\begin{align*}
E_{x\sim p(x)}[-\log q(x)]&=E_{x\sim p(x)}\left[\frac{n}{2}\log (2\pi)+\frac{1}{2}\log \det(\Sigma_q)+\frac{1}{2}(x-\mu_q)^T\Sigma_q^{-1}(x-\mu_q) \right]\\
&=\frac{n}{2}\log (2\pi)+\frac{1}{2}\log \det(\Sigma_q)+\frac{1}{2}E_{x\sim p(x)}\left[(x-\mu_q)^T\Sigma_q^{-1}(x-\mu_q) \right]
\end{align*}Ex∼p(x)[−logq(x)]=Ex∼p(x)[2nlog(2π)+21logdet(Σq)+21(x−μq)TΣq−1(x−μq)]=2nlog(2π)+21logdet(Σq)+21Ex∼p(x)[(x−μq)TΣq−1(x−μq)]
Frobenius内积:
对于 m×nm\times nm×n 的矩阵 A,BA,BA,B,它们的 Frobenius内积被定义为:
<A,B>F=∑i=1m∑j=1nAijBij<A,B>_F=\sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} <A,B>F=i=1∑mj=1∑nAijBij
Frobenius内积有如下性质:
<A,B>F=Tr(ATB)=Tr(BAT)=Tr(ABT)=Tr(BTA)<A,B>_F=Tr(A^TB)=Tr(BA^T)=Tr(AB^T)=Tr(B^TA) <A,B>F=Tr(ATB)=Tr(BAT)=Tr(ABT)=Tr(BTA)
根据 Frobenius内积 的性质:
Ex∼p(x)[(x−μq)TΣq−1(x−μq)]=Ex∼p(x)[Tr((x−μq)TΣq−1(x−μq))]=Ex∼p(x)[Tr(Σq−1(x−μq)(x−μq)T)]=Tr(Σq−1Ex∼p(x)[(x−μq)(x−μq)T])=Tr(Σq−1Ex∼p(x)[xxT−xμqT−μqxT+μqμqT)=Tr(Σq−1(Σp+μpμpT−μpμqT−μqμpT+μqμqT))=Tr(Σq−1Σp)+(μp−μq)TΣq−1(μp−μq)\begin{align*}
E_{x\sim p(x)}\left[(x-\mu_q)^T\Sigma_q^{-1}(x-\mu_q) \right]&=E_{x\sim p(x)}\left[Tr((x-\mu_q)^T\Sigma_q^{-1}(x-\mu_q)) \right]\\
&=E_{x\sim p(x)}\left[Tr(\Sigma_q^{-1}(x-\mu_q)(x-\mu_q)^T) \right]\\
&=Tr\left(\Sigma_q^{-1} E_{x\sim p(x)}[(x-\mu_q)(x-\mu_q)^T] \right)\\
&=Tr\left(\Sigma_q^{-1} E_{x\sim p(x)}[xx^T-x\mu_q^T-\mu_qx^T+\mu_q\mu_q^T \right)\\
&=Tr(\Sigma_q^{-1}(\Sigma_p+\mu_p\mu_p^T-\mu_p\mu_q^T-\mu_q\mu_p^T+\mu_q\mu_q^T))\\
&=Tr(\Sigma_q^{-1}\Sigma_p)+(\mu_p-\mu_q)^T\Sigma_q^{-1}(\mu_p-\mu_q)
\end{align*}Ex∼p(x)[(x−μq)TΣq−1(x−μq)]=Ex∼p(x)[Tr((x−μq)TΣq−1(x−μq))]=Ex∼p(x)[Tr(Σq−1(x−μq)(x−μq)T)]=Tr(Σq−1Ex∼p(x)[(x−μq)(x−μq)T])=Tr(Σq−1Ex∼p(x)[xxT−xμqT−μqxT+μqμqT)=Tr(Σq−1(Σp+μpμpT−μpμqT−μqμpT+μqμqT))=Tr(Σq−1Σp)+(μp−μq)TΣq−1(μp−μq)
至于 Ex∼p(x)[logp(x)]E_{x\sim p(x)}[\log p(x)]Ex∼p(x)[logp(x)],即是上面提到的熵的负数。所以最终结果为:
KL(p(x)∣∣q(x))=Ex∼p(x)[logp(x)]+Ex∼p(x)[−logq(x)]=[−n2(1+log2π)−12logdet(Σp)]+n2log(2π)+12logdet(Σq)+12[Tr(Σq−1Σp)+(μp−μq)TΣq−1(μp−μq)]=12[Tr(Σq−1Σp)+(μp−μq)TΣq−1(μp−μq)−n−logdet(Σq−1Σp)]\begin{align*}
KL(p(x)||q(x))&=E_{x\sim p(x)}[\log p(x)]+E_{x\sim p(x)}[-\log q(x)] \\
&=[-\frac{n}{2}(1+\log 2\pi)-\frac{1}{2}\log \det (\Sigma_p)]\\ &+\frac{n}{2}\log (2\pi)+\frac{1}{2}\log \det(\Sigma_q)+\frac{1}{2}[Tr(\Sigma_q^{-1}\Sigma_p)+(\mu_p-\mu_q)^T\Sigma_q^{-1}(\mu_p-\mu_q)]\\
&=\frac{1}{2}\left[Tr(\Sigma_q^{-1}\Sigma_p)+(\mu_p-\mu_q)^T\Sigma_q^{-1}(\mu_p-\mu_q)-n-\log\det (\Sigma_q^{-1}\Sigma_p) \right]
\end{align*}KL(p(x)∣∣q(x))=Ex∼p(x)[logp(x)]+Ex∼p(x)[−logq(x)]=[−2n(1+log2π)−21logdet(Σp)]+2nlog(2π)+21logdet(Σq)+21[Tr(Σq−1Σp)+(μp−μq)TΣq−1(μp−μq)]=21[Tr(Σq−1Σp)+(μp−μq)TΣq−1(μp−μq)−n−logdet(Σq−1Σp)]

2071

被折叠的 条评论
为什么被折叠?



