BernNet Learning Arbitrary Graph Spectral Filters via Bernstein Approximation
发表于:neurips21
推荐指数: #paper/⭐⭐
设定:在本文中,h是过滤器.
bernstein 多项式逼近(这个证明有点稀里糊涂的,反正我觉得一点点问题,可能因为我水平低)
p K ( t ) : = ∑ k = 0 K θ k ⋅ b k K ( t ) = ∑ k = 0 K f ( k K ) ⋅ ( K k ) ( 1 − t ) K − k t k . p_K(t):=\sum_{k=0}^K\theta_k\cdot b_k^K(t)=\sum_{k=0}^Kf\left(\frac kK\right)\cdot\binom Kk(1-t)^{K-k}t^k. pK(t):=k=0∑Kθk⋅bkK(t)=k=0∑Kf(Kk)⋅(kK)(1−t)K−ktk.
(其实类似于二项分布.如上K,k即二项分布的前缀)
推论2.1:给定一个连续函数f(t) t ∈ [ 0 , 1 ] t \in[0,1] t∈[0,1],我们有:当 K → ∞ K\to\infty K→∞时, p K ( t ) → f ( t ) p_{K}(t)\to f(t) pK(t)→f(t).
后者容易理解,当K趋近于无穷时,将f后面的即为二项分布,求和为0.而 f ( k K ) f\left( \frac{k}{K} \right) f(Kk)又和t相关,用t取代(感觉理解的有问题)
对于过滤函数 h : [ 0 , 2 ] → [ 0 , 1 ] h:[0,2]\to[0,1] h:[0,2]→[0,1],我们有 t = λ 2 t=\frac{\lambda}{2} t=2λ,我们就有: f ( t ) = h ( 2 t ) f(t)=h(2t) f(t)=h(2t), θ k = f ( k / K ) = h ( 2 k / K ) \theta_k = f(k/K) = h(2k/K) θk=f(k/K)=h(2k/K). b k K ( t ) = b k K ( λ 2 ) = ( K k ) ( 1 − λ 2 ) K − k ( λ 2 ) k b_{k}^K(t)=b_k^K(\frac\lambda2) = \binom Kk(1-\frac\lambda2)^{K-k}(\frac\lambda2)^k bkK(t)=bkK(2λ)=(kK)(1−2λ)K−k(2λ)k.最终,我们可以得到如下近似: p K ( λ / 2 ) = ∑ k = 0 K θ k ( K k ) ( 1 − λ 2 ) K − k ( λ 2 ) k = ∑ k = 0 K θ k 1 2 K ( K k ) ( 2 − λ ) K − k λ k p_K(\lambda/2)~=~\sum_{k=0}^K\theta_k\binom Kk(1-\frac\lambda2)^{K-k}\left(\frac\lambda2\right)^k~=~\sum_{k=0}^K\theta_k\frac1{2^K}\binom Kk(2-\lambda)^{K-k}\lambda^k pK(λ/2) = ∑k=0Kθk(kK)(1−2λ)K−k(2λ)k = ∑k=0Kθk2K1(kK)(2−λ)K−kλk.
z = U d i a g [ p K ( λ 1 / 2 ) , . . . , p K ( λ n / 2 ) ] U T ⏟ R e m N e t x = ∑ k = 0 K θ k 1 2 K ( K k ) ( 2 I − L ) K − k L k x \mathbf{z}=\underbrace{\mathbf{U}diag[p_K(\lambda_1/2),...,p_K(\lambda_n/2)]\mathbf{U}^T}_{\mathrm{RemNet}}\mathbf{x}=\sum_{k=0}^K\theta_k\frac1{2^K}\binom Kk(2\mathbf{I}-\mathbf{L})^{K-k}\mathbf{L}^k\mathbf{x} z=RemNet Udiag[pK(λ1/2),...,pK(λn/2)]UTx=k=0∑Kθk2K1(kK)(2I−L)K−kLkx
实现常见的过滤器通过BernNet
证明好麻烦啊,烦烦烦
附录:组合数性质
∙ C n k = C n n − k ∙ C n k + 1 = C n k × n − k k + 1 ∙ C n k = C n − 1 k − 1 × n k ∙ C n k = C n − 1 k − 1 + C n − 1 k \begin{aligned}&\bullet C_n^k = C_n^{n-k}\\&\bullet C_n^{k+1} = C_n^k \times \frac{n-k}{k+1}\\&\bullet C_n^k = C_{n-1}^{k-1} \times \frac{n}{k}\\&\bullet C_n^k = C_{n-1}^{k-1} + C_{n-1}^k\end{aligned} ∙Cnk=Cnn−k∙Cnk+1=Cnk×k+1n−k∙Cnk=Cn−1k−1×kn∙Cnk=Cn−1k−1+Cn−1k
C n k = A n k A k k = n k ‾ k ! = n ! k ! ( n − k ) ! C_n^k=\frac{A_n^k}{A_k^k}=\frac{n^{\underline{k}}}{k!}=\frac{n!}{k!(n-k)!} Cnk=AkkAnk=k!nk=k!(n−k)!n!
图过滤
min z f ( z ) = ( 1 − α ) z T γ ( L ) z + α ∥ z − x ∥ 2 2 \min_\mathbf{z}f(\mathbf{z})=(1-\alpha)\mathbf{z}^T\gamma(\mathbf{L})\mathbf{z}+\alpha\|\mathbf{z}-\mathbf{x}\|_2^2 zminf(z)=(1−α)zTγ(L)z+α∥z−x∥22
令其倒数为0, α = 0.5 \alpha=0.5 α=0.5, γ ( L ) = e t L − I \gamma(\mathbf{L})=e^{t\mathbf{L}}-\mathbf{I} γ(L)=etL−I. ∂ f ( z ) ∂ z = ( e t L − I ) z + z − x = 0 , \frac{\partial f(\mathbf{z})}{\partial\mathbf{z}}=\left(e^{t\mathbf{L}}-\mathbf{I}\right)\mathbf{z}+\mathbf{z}-\mathbf{x}=\mathbf{0}, ∂z∂f(z)=(etL−I)z+z−x=0,
z ∗ = e − t L x = e − t ( I − P ) x = ∑ k = 0 ∞ e − t t k k ! P k x . \mathbf{z}^*=e^{-t\mathbf{L}}\mathbf{x}=e^{-t(\mathbf{I}-\mathbf{P})}\mathbf{x}=\sum_{k=0}^\infty e^{-t}\frac{t^k}{k!}\mathbf{P}^k\mathbf{x}. z∗=e−tLx=e−t(I−P)x=∑k=0∞e−tk!tkPkx.
这就是基于图热核的GNN例如GDC和GraphHeat采用的核
过滤器的非负性(保证凸优化)
0 ≤ g ( λ ) = ∑ k = 0 K w k λ k ≤ 1 , ∀ λ ∈ [ 0 , 2 ] . 0\leq g(\lambda)=\sum_{k=0}^Kw_k\lambda^k\leq1, \forall \lambda\in[0,2]. 0≤g(λ)=∑k=0Kwkλk≤1,∀λ∈[0,2].证明:
α ( α I + ( 1 − α ) γ ( L ) ) − 1 x = U d i a g [ α α + ( 1 − α ) γ ( λ 1 ) , . . . , α α + ( 1 − α ) γ ( λ n ) ] U T x . \alpha\left(\alpha\mathbf{I}+(1-\alpha)\gamma(\mathbf{L})\right)^{-1}\mathbf{x}=\mathbf{U}diag\left[\frac\alpha{\alpha+(1-\alpha)\gamma(\lambda_1)},...,\frac\alpha{\alpha+(1-\alpha)\gamma(\lambda_n)}\right]\mathbf{U}^T\mathbf{x}. α(αI+(1−α)γ(L))−1x=Udiag[α+(1−α)γ(λ1)α,...,α+(1−α)γ(λn)α]UTx.
λ ∈ [ 0 , 2 ] , we have 0 ≤ h ( λ ) ≤ α α + ( 1 − α ) ⋅ 0 = 1 for λ ∈ [ 0 , 2 ] . \lambda\in[0,2],\text{ we have }0\leq h(\lambda)\leq\frac\alpha{\alpha+(1-\alpha)\cdot0}=1\text{ for }\lambda\in[0,2]. λ∈[0,2], we have 0≤h(λ)≤α+(1−α)⋅0α=1 for λ∈[0,2].
结果:貌似挺高的,但是别人跑的就没那么高.
结构: Z = ∑ k = 0 K θ k 1 2 K ( K k ) ( 2 I − L ) K − k L k f ( X ) , \mathbf{Z}=\sum_{k=0}^K\theta_k\frac1{2^K}\binom Kk(2\mathbf{I}-\mathbf{L})^{K-k}\mathbf{L}^kf\left(\mathbf{X}\right), Z=∑k=0Kθk2K1(kK)(2I−L)K−kLkf(X),
其中:f(X)是二层的MLP