扩散模型的一些公式证明
扩散模型的前向扩散过程:
q ( x 1 : T ∣ x 0 ) = ∏ t = 1 T q ( x t ∣ x t − 1 ) , q ( x t ∣ x t − 1 ) : = N ( 1 − β t ) x t − 1 , β t I ) ( 1 ) q(x_{1:T}|x_0) = \prod_{t=1}^Tq(x_t|x_{t-1}),q(x_t|x_{t-1}):=\mathcal{N}(\sqrt{1-\beta_t})x_{t-1},\beta_tI)\quad(1) q(x1:T∣x0)=t=1∏Tq(xt∣xt−1),q(xt∣xt−1):=N(1−βt)xt−1,βtI)(1)
逆向去噪过程:
p θ ( X 0 : T ) = p ( X T ) ∏ t = 1 T p ( X t − 1 ∣ X t ) , p θ ( X t − 1 ∣ X t ) : = N ( μ θ ( X t , t ) , Σ θ ( X t , t ) ) ( 2 ) p_{\theta}(X_{0:T}) = p(X_T)\prod_{t=1}^Tp(X_{t-1}|X_t),p_{\theta}(X_{t-1}|X_t):=\mathcal{N}(\mu_{\theta}(X_t, t), \Sigma_{\theta}(X_t, t))\quad (2) pθ(X0:T)=p(XT)t=1∏Tp(Xt−1∣Xt),pθ(Xt−1∣Xt):=N(μθ(Xt,t),Σθ(Xt,t))(2)
模型的学习目标为:
E [ − l o g p θ ( x 0 ) ] ≤ E q [ − l o g p θ ( x 0 : T ) q ( x 1 : T ∣ x 0 ) ] = E q [ − l o g p ( x t ) − ∑ t ≥ 1 l o g p θ ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] : = L ( 3 ) E[-logp_{\theta}(x_0)]\le E_q[-log\frac{p_{\theta}(x_{0:T})}{q(x_{1:T}|x_0)}]=E_q[-logp(x_t)-\sum_{t\ge1}log\frac{p_{\theta}(x_{t-1}|x_t)}{q(x_t|x_{t-1})}]:=L\quad(3) E[−logpθ(x0)]≤Eq[−logq(x1:T∣x0)pθ(x0:T)]=Eq[−logp(xt)−t≥1∑logq(xt∣xt−1)pθ(xt−1∣xt)]:=L(3)
个人觉得原文中的公式(3)似乎有些问题,这里按自己的理解稍微修改了一点:
− l o g p θ ( x 0 ) ≤ E q [ − l o g p θ ( x 0 : T ) q ( x 1 : T ∣ x 0 ) ] = E q [ − l o g p ( x T ) − ∑ t ≥ 1 l o g p θ ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] : = L ( 3 ′ ) -logp_{\theta}(x_0)\le E_q[-log\frac{p_{\theta}(x_{0:T})}{q(x_{1:T}|x_0)}]=E_q[-logp(x_T)-\sum_{t\ge1}log\frac{p_{\theta}(x_{t-1}|x_t)}{q(x_t|x_{t-1})}]:=L\quad(3') −logpθ(x0)≤Eq[−logq(x1:T∣x0)pθ(x0:T)]=Eq[−logp(xT)−t≥1∑logq(xt∣xt−1)pθ(xt−1∣xt)]:=L(3′)
并给出证明如下:
− l o g p θ ( x 0 ) = E x 1 : T ∼ q ( x 1 : T ∣ x 0 ) [ − l o g p θ ( x 0 ) ] = E q [ − l o g p θ ( x 0 ) ] -logp_{\theta}(x_0) = E_{x_{1:T}\sim q(x_{1:T}|x_0)}[-logp_{\theta}(x_0)] = E_{q}[-logp_{\theta}(x_0)] −logpθ(x0)=Ex1:T∼q(x1:T∣x0)[−logpθ(x0)]=Eq[−logpθ(x0)]
= E q [ − l o g p θ ( x 0 : T ) p θ ( x 1 : T ∣ x 0 ) ] = E q [ − l o g p θ ( x 0 : T ) p θ ( x 1 : T ∣ x 0 ) q ( x 1 : T ∣ x 0 ) q ( x 1 : T ∣ x 0 ) ] = E_{q}[-log\frac{p_{\theta}(x_{0:T})}{p_{\theta}(x_{1:T} | x_0)}] = E_{q}[-log\frac{p_{\theta}(x_{0:T})}{p_{\theta}(x_{1:T} | x_0)} \frac{q(x_{1:T} | x_0)}{q(x_{1:T} | x_0)}] =Eq[−logpθ(x1:T∣x0)pθ(x0:T)]=Eq[−logpθ(x1:T∣x0)pθ(x0:T)q(x1:T∣x0)q(x1:T∣x0)]
= E q [ − l o g p θ ( x 0 : T ) q ( x 1 : T ∣ x 0 ) q ( x 1 : T ∣ x 0 ) p θ ( x 1 : T ∣ x 0 ) ] = E q [ − l o g p θ ( x 0 : T ) q ( x 1 : T ∣ x 0 ) ] − E q [ q ( x 1 : T ∣ x 0 ) p θ ( x 1 : T ∣ x 0 ) ] =E_q[-log\frac{p_{\theta}(x_{0:T})}{q(x_{1:T} | x_0)} \frac{q(x_{1:T} | x_0)}{p_{\theta}(x_{1:T} | x_0)}]= E_q[-log\frac{p_{\theta}(x_{0:T})}{q(x_{1:T} | x_0)}]-E_q[\frac{q(x_{1:T} | x_0)}{p_{\theta}(x_{1:T} | x_0)}] =Eq[−logq(x1:T∣x0)pθ(x0:T)pθ(x1:T∣x0)q(x1:T∣x0)]=Eq[−logq(x1:T∣x0)pθ(x0:T)]−Eq[pθ(x1:T∣x0)q(x1:T∣x0)]
= E q [ − l o g p θ ( x 0 : T ) q ( x 1 : T ∣ x 0 ) ] − D K L ( q ( x 1 : T ∣ x 0 ) ∣ ∣ p ( x 1 : T ∣ x 0 ) ) ≤ E q [ − l o g p θ ( x 0 : T ) q ( x 1 : T ∣ x 0 ) ] =E_q[-log\frac{p_{\theta}(x_{0:T})}{q(x_{1:T} | x_0)}]-D_{KL}(q(x_{1:T}|x_0)||p(x_{1:T}|x_0)) \le E_q[-log\frac{p_{\theta}(x_{0:T})}{q(x_{1:T} | x_0)}] =Eq[−logq(x1:T∣x0)pθ(x0:T)]−DKL(q(x1:T∣x0)∣∣p(x1:T∣x0))≤Eq[−logq(x1:T∣x0)pθ(x0:T)]
原文中然后给出 L 进一步推导的结果:
L = E q [ D K L ( q ( x T ∣ x 0 ) ∣ ∣ p ( x T ) ) + ∑ t > 1 D K L ( q ( x t − 1 ∣ x t , x 0 ) ∣ ∣ p θ ( x t − 1 ∣ x t ) ) − l o g p θ ( x 0 ∣ x 1 ) ] ( 5 ) L = E_q[D_{KL}(q(x_T|x_0)||p(x_T)) + \sum_{t > 1}D_{KL}(q(x_{t-1}|x_t, x_0)||p_{\theta}(x_{t-1}|x_t)) - logp_{\theta}(x_0|x_1)]\quad(5) L=Eq[DKL(q(xT∣x0)∣∣p(xT))+t>1∑DKL(q(xt−1∣xt,x0)∣∣pθ(xt−1∣xt))−logpθ(x0∣x1)](5)
觉得(5)似乎也有些问题,也按自己的理解修改为:
L = D K L ( q ( x T ∣ x 0 ) ∣ ∣ p ( x T ) ) + ∑ t > 1 D K L ( q ( x t − 1 ∣ x t , x 0 ) ∣ ∣ p ( x t − 1 ∣ x t ) ) − l o g p ( x 0 ∣ x 1 ) ( 5 ′ ) L=D_{KL}(q(x_{T}|x_0)||p(x_{T})) + \sum_{t>1}D_{KL}(q(x_{t-1}|x_t,x_0)||p(x_{t-1}|x_t)) - logp(x_0|x_1)\quad(5') L=DKL(q(xT∣x0)∣∣p(xT))+t>1∑DKL(q(xt−1∣xt,x0)∣∣p(xt−1∣xt))−logp(x0∣x1)(5′)
并给出证明如下:
L = E q [ − l o g p θ ( x 0 : T ) q ( x 1 : T ∣ x 0 ) ] = E q [ − l o g p ( x T ) ∏ t ≥ 1 p ( x t − 1 ∣ x t ) ∏ t ≥ 1 q ( x t ∣ x t − 1 ) ] L=E_q[-log\frac{p_{\theta}(x_{0:T})}{q(x_{1:T} | x_0)}] = E_q[-log\frac{p(x_T)\prod_{t\ge1} p(x_{t-1}|x_{t})}{\prod_{t\ge1} q(x_t|x_{t-1})}] L=Eq[−logq(x1:T∣x0)pθ(x0:T)]=Eq[−log∏t≥1q(xt∣xt−1)p(xT)∏t≥1p(xt−1∣xt)]
= E q [ − l o g p ( x T ) − l o g ∏ t ≥ 1 p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] = E q [ − l o g p ( x T ) − ∑ t ≥ 1 l o g p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] =E_q[-logp(x_T) - log\prod_{t\ge1}\frac{p(x_{t-1}|x_t)}{q(x_t|x_{t-1})}] = E_q[-logp(x_T) - \sum_{t\ge1}log\frac{p(x_{t-1}|x_{t})}{q(x_t|x_{t-1})}] =Eq[−logp(xT)−logt≥1∏q(xt∣xt−1)p(xt−1∣xt)]=Eq[−logp(xT)−t≥1∑logq(xt∣xt−1)p(xt−1∣xt)]
= E q [ − l o g p ( x T ) − ∑ t > 1 l o g p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) − l o g p ( x 0 ∣ x 1 ) q ( x 1 ∣ x 0 ) ] =E_q[-logp(x_T) - \sum_{t>1}log\frac{p(x_{t-1}|x_t)}{q(x_t|x_{t-1})} - log\frac{p(x_0|x_1)}{q(x_1|x_0)}] =Eq[−logp(xT)−t>1∑logq(xt∣xt−1)p(xt−1∣xt)−logq(x1∣x0)p(x0∣x1)]
= E q [ − l o g p ( x T ) − ∑ t > 1 l o g p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) − l o g p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) ] = E_q[-logp(x_T) - \sum_{t>1}log\frac{p(x_{t-1}|x_t)}{q(x_t|x_{t-1})} - log\frac{p(x_0|x_1)}{p(x_1|x_0)}] =Eq[−logp(xT)−t>1∑logq(xt∣xt−1)p(xt−1∣xt)−logp(x1∣x0)p(x0∣x1)]
= E q [ − l o g p ( x T ) − ∑ t > 1 l o g p ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 , x 0 ) − l o g p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) ] = E_q[-logp(x_T) - \sum_{t>1}log\frac{p(x_{t-1}|x_t)}{q(x_t|x_{t-1},x_0)} - log\frac{p(x_0|x_1)}{p(x_1|x_0)}] =Eq[−logp(xT)−t>1∑logq(xt∣xt−1,x0)p(xt−1∣xt)−logp(x1∣x0)p(x0∣x1)]
= E q [ − l o g p ( x T ) − ∑ t > 1 l o g p ( x t − 1 ∣ x t ) q ( x t , x t − 1 , x 0 ) ⋅ q ( x t − 1 , x 0 ) q ( x 0 ) q ( x 0 ) ⋅ q ( x t , x 0 ) q ( x t , x 0 ) − l o g p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) ] =E_q[-logp(x_T) - \sum_{t>1}log\frac{p(x_{t-1}|x_t)}{q(x_t,x_{t-1},x_0)}\cdot q(x_{t-1},x_0)\frac{q(x_0)}{q(x_0)} \cdot\frac{q(x_t,x_0)}{q(x_t,x_0)} - log\frac{p(x_0|x_1)}{p(x_1|x_0)}] =Eq[−logp(xT)−t>1∑logq(xt,xt−1,x0)p(xt−1∣xt)⋅q(xt−1,x0)q(x0)q(x0)⋅q(xt,x0)q(xt,x0)−logp(x1∣x0)p(x0∣x1)]
= E q [ − l o g p ( x T ) − ∑ t > 1 l o g p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) q ( x t − 1 , x 0 ) q ( x 0 ) ⋅ q ( x 0 ) q ( x t , x 0 ) − l o g p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) ] =E_q[-logp(x_T) - \sum_{t>1}log\frac{p(x_{t-1}|x_t)}{q(x_{t-1}|x_t,x_0)}\frac{q(x_{t-1},x_0)}{q(x_0)} \cdot\frac{q(x_0)}{q(x_t,x_0)} - log\frac{p(x_0|x_1)}{p(x_1|x_0)}] =Eq[−logp(xT)−t>1∑logq(xt−1∣xt,x0)p(xt−1∣xt)q(x0)q(xt−1,x0)⋅q(xt,x0)q(x0)−logp(x1∣x0)p(x0∣x1)]
= E q [ − l o g p ( x T ) − ∑ t > 1 l o g p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ⋅ q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) − l o g p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) ] =E_q[-logp(x_T) - \sum_{t>1}log\frac{p(x_{t-1}|x_t)}{q(x_{t-1}|x_t,x_0)} \cdot\frac{q(x_{t-1}|x_0)}{q(x_t|x_0)} - log\frac{p(x_0|x_1)}{p(x_1|x_0)}] =Eq[−logp(xT)−t>1∑logq(xt−1∣xt,x0)p(xt−1∣xt)⋅q(xt∣x0)q(xt−1∣x0)−logp(x1∣x0)p(x0∣x1)]
= E q [ − l o g p ( x T ) − ∑ t > 1 l o g p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) − ∑ t > 1 l o g q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) − l o g p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) ] =E_q[-logp(x_T) - \sum_{t>1}log\frac{p(x_{t-1}|x_t)}{q(x_{t-1}|x_t,x_0)} -\sum_{t>1}log\frac{q(x_{t-1}|x_0)}{q(x_t|x_0)} - log\frac{p(x_0|x_1)}{p(x_1|x_0)}] =Eq[−logp(xT)−t>1∑logq(xt−1∣xt,x0)p(xt−1∣xt)−t>1∑logq(xt∣x0)q(xt−1∣x0)−logp(x1∣x0)p(x0∣x1)]
= E q [ − l o g p ( x T ) − ∑ t > 1 l o g p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) − l o g q ( x 1 ∣ x 0 ) q ( x T ∣ x 0 ) − l o g p ( x 0 ∣ x 1 ) p ( x 1 ∣ x 0 ) ] =E_q[-logp(x_T) - \sum_{t>1}log\frac{p(x_{t-1}|x_t)}{q(x_{t-1}|x_t,x_0)} -log\frac{q(x_1|x_0)}{q(x_T|x_0)} - log\frac{p(x_0|x_1)}{p(x_1|x_0)}] =Eq[−logp(xT)−t>1∑logq(xt−1∣xt,x0)p(xt−1∣xt)−logq(xT∣x0)q(x1∣x0)−logp(x1∣x0)p(x0∣x1)]
= E q [ − l o g p ( x T ) q ( x T ∣ x 0 ) − ∑ t > 1 l o g p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) − l o g p ( x 0 ∣ x 1 ) ] =E_q[-log\frac{p(x_T)}{q(x_T|x_0)} - \sum_{t>1}log\frac{p(x_{t-1}|x_t)}{q(x_{t-1}|x_t,x_0)} - logp(x_0|x_1)] =Eq[−logq(xT∣x0)p(xT)−t>1∑logq(xt−1∣xt,x0)p(xt−1∣xt)−logp(x0∣x1)]
= E q [ − l o g p ( x T ) q ( x T ∣ x 0 ) ] + E q [ − ∑ t > 1 l o g p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + E q [ − l o g p ( x 0 ∣ x 1 ) ] =E_q[-log\frac{p(x_T)}{q(x_T|x_0)}] + E_q[- \sum_{t>1}log\frac{p(x_{t-1}|x_t)}{q(x_{t-1}|x_t,x_0)} ] + E_q[- logp(x_0|x_1)] =Eq[−logq(xT∣x0)p(xT)]+Eq[−t>1∑logq(xt−1∣xt,x0)p(xt−1∣xt)]+Eq[−logp(x0∣x1)]
= E x 1 : T ∼ q ( x 1 : T ∣ x 0 ) [ − l o g p ( x T ) q ( x T ∣ x 0 ) ] + E x 1 : T ∼ q ( x 1 : T ∣ x 0 ) [ − ∑ t > 1 l o g p ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] + E x 1 : T ∼ q ( x 1 : T ∣ x 0 ) [ − l o g p ( x 0 ∣ x 1 ) ] =E_{x_{1:T}\sim q(x_{1:T}|x_0)}[-log\frac{p(x_T)}{q(x_T|x_0)}] + E_{x_{1:T}\sim q(x_{1:T}|x_0)}[- \sum_{t>1}log\frac{p(x_{t-1}|x_t)}{q(x_{t-1}|x_t,x_0)} ] + E_{x_{1:T}\sim q(x_{1:T}|x_0)}[- logp(x_0|x_1)] =Ex1:T∼q(x1:T∣x0)[−logq(xT∣x0)p(xT)]+Ex1:T∼q(x1:T∣x0)[−t>1∑logq(xt−1∣xt,x0)p(xt−1∣xt)]+Ex1:T∼q(x1:T∣x0)[−logp(x0∣x1)]
= E x 1 : T ∼ q ( x 1 : T ∣ x 0 ) [ l o g q ( x T ∣ x 0 ) p ( x T ) ] + E x 1 : T ∼ q ( x 1 : T ∣ x 0 ) [ ∑ t > 1 l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ] − l o g p ( x 0 ∣ x 1 ) =E_{x_{1:T}\sim q(x_{1:T}|x_0)}[log\frac{q(x_T|x_0)}{p(x_T)}] + E_{x_{1:T}\sim q(x_{1:T}|x_0)}[ \sum_{t>1}log\frac{q(x_{t-1}|x_t,x_0)}{p(x_{t-1}|x_t)}] - logp(x_0|x_1) =Ex1:T∼q(x1:T∣x0)[logp(xT)q(xT∣x0)]+Ex1:T∼q(x1:T∣x0)[t>1∑logp(xt−1∣xt)q(xt−1∣xt,x0)]−logp(x0∣x1)
= E x 1 : T ∼ q ( x 1 : T ∣ x 0 ) [ l o g q ( x T ∣ x 0 ) p ( x T ) ] + ∑ t > 1 E x 1 : T ∼ q ( x 1 : T ∣ x 0 ) [ l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ] − l o g p ( x 0 ∣ x 1 ) =E_{x_{1:T}\sim q(x_{1:T}|x_0)}[log\frac{q(x_T|x_0)}{p(x_T)}] + \sum_{t>1}E_{x_{1:T}\sim q(x_{1:T}|x_0)}[ log\frac{q(x_{t-1}|x_t,x_0)}{p(x_{t-1}|x_t)}] - logp(x_0|x_1) =Ex1:T∼q(x1:T∣x0)[logp(xT)q(xT∣x0)]+t>1∑Ex1:T∼q(x1:T∣x0)[logp(xt−1∣xt)q(xt−1∣xt,x0)]−logp(x0∣x1)
E x 1 : T ∼ q ( x 1 : T ∣ x 0 ) [ l o g q ( x T ∣ x 0 ) p ( x T ) ] = ∫ q ( x 1 : T ∣ x 0 ) l o g q ( x T ∣ x 0 ) p ( x T ) d x 1 : T = ∫ ( ∫ q ( x 1 : T ∣ x 0 ) q ( x T ∣ x 0 ∏ k ≥ 1 , k ≠ T d x k ) q ( x T ∣ x 0 ) l o g q ( x T ∣ x 0 ) p ( x T ) d x T E_{x_{1:T}\sim q(x_{1:T}|x_0)}[log\frac{q(x_T|x_0)}{p(x_T)}] = \int q(x_{1:T}|x_0)log\frac{q(x_T|x_0)}{p(x_T)}dx_{1:T} = \int (\int\frac{q(x_{1:T}|x_0)}{q(x_T|x_0}\prod_{k\ge 1, k\ne T} dx_k)q(x_T|x_0)log\frac{q(x_T|x_0)}{p(x_T)}dx_T Ex1:T∼q(x1:T∣x0)[logp(xT)q(xT∣x0)]=∫q(x1:T∣x0)logp(xT)q(xT∣x0)dx1:T=∫(∫q(xT∣x0q(x1:T∣x0)k≥1,k=T∏dxk)q(xT∣x0)logp(xT)q(xT∣x0)dxT
= ∫ ( ∫ q ( x 1 : T − 1 ∣ x T , x 0 ) ∏ T > k ≥ 1 d x k ) q ( x T ∣ x 0 ) l o g q ( x T ∣ x 0 ) p ( x T ) d x T = ∫ q ( x T ∣ x 0 ) l o g q ( x T ∣ x 0 ) p ( x T ) d x T = E x T ∼ q ( x T ∣ x 0 ) [ l o g q ( x T ∣ x 0 ) p ( x T ) ] =\int (\int q(x_{1:T-1}|x_T,x_0)\prod_{T > k\ge 1} dx_k)q(x_T|x_0)log\frac{q(x_T|x_0)}{p(x_T)}dx_T=\int q(x_T|x_0)log\frac{q(x_T|x_0)}{p(x_T)}dx_T = E_{x_T\sim q(x_T|x_0)}[log\frac{q(x_T|x_0)}{p(x_T)}] =∫(∫q(x1:T−1∣xT,x0)T>k≥1∏dxk)q(xT∣x0)logp(xT)q(xT∣x0)dxT=∫q(xT∣x0)logp(xT)q(xT∣x0)dxT=ExT∼q(xT∣x0)[logp(xT)q(xT∣x0)]
E x 1 : T ∼ q ( x 1 : T ∣ x 0 ) [ l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ] = ∫ q ( x 1 : T ) l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x 1 : T E_{x_{1:T}\sim q(x_{1:T}|x_0)}[ log\frac{q(x_{t-1}|x_t,x_0)}{p(x_{t-1}|x_t)}] = \int q(x_{1:T})log\frac{q(x_{t-1}|x_t,x_0)}{p(x_{t-1}|x_t)}dx_{1:T} Ex1:T∼q(x1:T∣x0)[logp(xt−1∣xt)q(xt−1∣xt,x0)]=∫q(x1:T)logp(xt−1∣xt)q(xt−1∣xt,x0)dx1:T
= ∫ ( ∫ q ( x 1 : T ) q ( x t − 1 ∣ x t , x 0 ) ∏ k ≥ 1 , k ≠ t − 1 d x k ) q ( x t − 1 ∣ x t , x 0 ) l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x t − 1 =\int (\int \frac{q(x_{1:T})}{q(x_{t-1}|x_t,x_0)}\prod_{k\ge 1, k\ne t-1}dx_k)q(x_{t-1}|x_t,x_0)log\frac{q(x_{t-1}|x_t,x_0)}{p(x_{t-1}|x_t)}dx_{t-1} =∫(∫q(xt−1∣xt,x0)q(x1:T)k≥1,k=t−1∏dxk)q(xt−1∣xt,x0)logp(xt−1∣xt)q(xt−1∣xt,x0)dxt−1
= ∫ ( ∫ q ( x 0 : T ) q ( x 0 ) ⋅ q ( x t , x 0 ) q ( x t , x t − 1 , x 0 ) ∏ k ≥ 1 , k ≠ t − 1 d x k ) q ( x t − 1 ∣ x t , x 0 ) l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x t − 1 =\int (\int\frac{q(x_{0:T})}{q(x_0)}\cdot\frac{q(x_t,x_0)}{q(x_t,x_{t-1},x_0)}\prod_{k\ge 1, k\ne t-1}dx_k)q(x_{t-1}|x_t,x_0)log\frac{q(x_{t-1}|x_t,x_0)}{p(x_{t-1}|x_t)}dx_{t-1} =∫(∫q(x0)q(x0:T)⋅q(xt,xt−1,x0)q(xt,x0)k≥1,k=t−1∏dxk)q(xt−1∣xt,x0)logp(xt−1∣xt)q(xt−1∣xt,x0)dxt−1
= ∫ ( ∫ q ( x 0 : T ) q ( x 0 ) ⋅ q ( x t , x 0 ) q ( x t ∣ x t − 1 , x 0 ) q ( x t − 1 , x 0 ) ∏ k ≥ 1 , k ≠ t − 1 d x k ) q ( x t − 1 ∣ x t , x 0 ) l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x t − 1 =\int (\int\frac{q(x_{0:T})}{q(x_0)}\cdot\frac{q(x_t,x_0)}{q(x_t|x_{t-1},x_0)q(x_{t-1},x_0)}\prod_{k\ge 1, k\ne t-1}dx_k)q(x_{t-1}|x_t,x_0)log\frac{q(x_{t-1}|x_t,x_0)}{p(x_{t-1}|x_t)}dx_{t-1} =∫(∫q(x0)q(x0:T)⋅q(xt∣xt−1,x0)q(xt−1,x0)q(xt,x0)k≥1,k=t−1∏dxk)q(xt−1∣xt,x0)logp(xt−1∣xt)q(xt−1∣xt,x0)dxt−1
= ∫ ( ∫ q ( x 0 : T ) q ( x t − 1 , x 0 ) ⋅ q ( x t , x 0 ) q ( x 0 ) q ( x t ∣ x t − 1 , x 0 ) ∏ k ≥ 1 , k ≠ t − 1 d x k ) q ( x t − 1 ∣ x t , x 0 ) l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x t − 1 =\int (\int\frac{q(x_{0:T})}{q(x_{t-1},x_0)}\cdot\frac{q(x_t,x_0)}{q(x_0)q(x_t|x_{t-1},x_0)}\prod_{k\ge 1, k\ne t-1}dx_k)q(x_{t-1}|x_t,x_0)log\frac{q(x_{t-1}|x_t,x_0)}{p(x_{t-1}|x_t)}dx_{t-1} =∫(∫q(xt−1,x0)q(x0:T)⋅q(x0)q(xt∣xt−1,x0)q(xt,x0)k≥1,k=t−1∏dxk)q(xt−1∣xt,x0)logp(xt−1∣xt)q(xt−1∣xt,x0)dxt−1
= ∫ ( ∫ q ( x k : k ≥ 1 , k ≠ t − 1 ∣ x t − 1 , x 0 ) ⋅ q ( x t ∣ x 0 ) q ( x t ∣ x t − 1 , x 0 ) ∏ k ≥ 1 , k ≠ t − 1 d x k ) q ( x t − 1 ∣ x t , x 0 ) l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x t − 1 =\int (\int q(x_{k:k\ge 1, k\ne t-1}|x_{t-1},x_0)\cdot\frac{q(x_t|x_0)}{q(x_t|x_{t-1},x_0)}\prod_{k\ge 1, k\ne t-1}dx_k)q(x_{t-1}|x_t,x_0)log\frac{q(x_{t-1}|x_t,x_0)}{p(x_{t-1}|x_t)}dx_{t-1} =∫(∫q(xk:k≥1,k=t−1∣xt−1,x0)⋅q(xt∣xt−1,x0)q(xt∣x0)k≥1,k=t−1∏dxk)q(xt−1∣xt,x0)logp(xt−1∣xt)q(xt−1∣xt,x0)dxt−1
= ∫ ( ∫ q ( x k : k ≥ 1 , k ≠ t − 1 ∣ x t − 1 , x 0 ) ⋅ 1 ∏ k ≥ 1 , k ≠ t − 1 d x k ) q ( x t − 1 ∣ x t , x 0 ) l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) d x t − 1 = E x t − 1 ∼ q ( x t − 1 ∣ x t , x 0 ) [ l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ] =\int (\int q(x_{k:k\ge 1, k\ne t-1}|x_{t-1},x_0) \cdot 1 \prod_{k\ge 1, k\ne t-1}dx_k)q(x_{t-1}|x_t,x_0)log\frac{q(x_{t-1}|x_t,x_0)}{p(x_{t-1}|x_t)}dx_{t-1} = E_{x_{t-1}\sim q(x_{t-1}|x_t,x_0)}[log\frac{q(x_{t-1}|x_t,x_0)}{p(x_{t-1}|x_t)}] =∫(∫q(xk:k≥1,k=t−1∣xt−1,x0)⋅1k≥1,k=t−1∏dxk)q(xt−1∣xt,x0)logp(xt−1∣xt)q(xt−1∣xt,x0)dxt−1=Ext−1∼q(xt−1∣xt,x0)[logp(xt−1∣xt)q(xt−1∣xt,x0)]
L = E x T ∼ q ( x T ∣ x 0 ) [ l o g q ( x T ∣ x 0 ) p ( x T ) ] + ∑ t > 1 E x t − 1 ∼ q ( x t − 1 ∣ x T , x 0 ) [ l o g q ( x t − 1 ∣ x t , x 0 ) p ( x t − 1 ∣ x t ) ] − l o g p ( x 0 ∣ x 1 ) L=E_{x_{T}\sim q(x_{T}|x_0)}[log\frac{q(x_T|x_0)}{p(x_T)}] + \sum_{t>1}E_{x_{t-1}\sim q(x_{t-1}|x_T,x_0)}[ log\frac{q(x_{t-1}|x_t,x_0)}{p(x_{t-1}|x_t)}] - logp(x_0|x_1) L=ExT∼q(xT∣x0)[logp(xT)q(xT∣x0)]+t>1∑Ext−1∼q(xt−1∣xT,x0)[logp(xt−1∣xt)q(xt−1∣xt,x0)]−logp(x0∣x1)
= D K L ( q ( x T ∣ x 0 ) ∣ ∣ p ( x T ) ) + ∑ t > 1 D K L ( q ( x t − 1 ∣ x t , x 0 ) ∣ ∣ p ( x t − 1 ∣ x t ) ) − l o g p ( x 0 ∣ x 1 ) =D_{KL}(q(x_{T}|x_0)||p(x_{T})) + \sum_{t>1}D_{KL}(q(x_{t-1}|x_t,x_0)||p(x_{t-1}|x_t)) - logp(x_0|x_1) =DKL(q(xT∣x0)∣∣p(xT))+t>1∑DKL(q(xt−1∣xt,x0)∣∣p(xt−1∣xt))−logp(x0∣x1)