论文地址:https://aclanthology.org/2021.acl-long.62.pdf
代码地址:https://github.com/ytc272098215/FakeNewsDetection
methods
Directed Heterogeneous Document Graph
对document d,构造包含sentence、topic、entity三种节点的异质图
topic由LDA提取,每个sentence都与topic双向链接(new document也可以通过LDA提取主题)
用TAGME工具连接实体,当sentence中含有entity时,构建sentence->entity的单向连接,仅限sentence到entity的message passing,防止entity的信息干扰sentence分类。
Heterogeneous Graph Convolution
three types of edges $\tau=\{\tau_1,\tau_2,\tau_3\}$:S、T、E
S:LSTM编码sentence得到
T:one-hot topic
E:KB embedding
不同类型的node具有不同的转换矩阵$W_\tau$
$H^{(l+1)}=\sigma(\Sigma_{\tau\in T} \mathcal{B}_\tau\cdot H_\tau^{(l)}\cdot W_\tau^{l})$
$\beta_\tau\in R^{|V|\times|V_\tau|}$是attention矩阵:
其中$v$是attention vector,$\alpha_\tau$是type-level的attention权重,根据$\hat{A}=D^{-1/2}(A+I)D^{1/2}$加权计算相邻节点的type向量 $h_\tau=\sum_{v’} \hat{A}_{vv’}h_{v’}$
$\alpha_\tau = Softmax_\tau(\sigma(\mu^T_\tau\cdot[h_v,h_\tau]))$
通过多层信息传递,对sentence embedding取maxpooling得到document embedding $H_d$,并获得上下文相关的entity embedding $e_c$。
entity comparision network
实体比较网络对contextual entity embedding $e_c\in R^N$和原始entity embedding $e_{KB}\in R^M$进行比较
entity embedding
Structual $e_s$(TransE)+Texual $e_d$(Wikipedia paragraph embedding by LSTM)
训练门控网络融合两种embedding
$e_{KB}=g_e \odot e_s + (1-g_e)\odot e_d$
entity comparision
$a_i = f_{cmp}(e_c,W_e\cdot e_{KB})$
$f_{cmp}(x, y)=W_a[x-y, x\odot y]$
$W_e\in R^{N\times M},W_a\in R^{N\times 2N}$
document的entity comparision vector $C$由其包含的所有entity的比较向量取maxpooling
model training
document embedding和document entity comparision vector送入linear+softmax
$Z=Softmax(W_o[H_d,C]+b_o)$
使用CEloss
experiment
future
在KB的entity之外,再融合多模态数据。