Bahdanau Attention和Luong Attention的区别

Question

Attention机制在NMT中被提出后迅速成为NLP核心组件。请详细对比Bahdanau Attention（加性注意力）和Luong Attention（乘性注意力）在分数计算、对齐方式、上下文向量使用等方面的差异。。AI 面试题

编译有声 · Accepted Answer

核心区别对比： 维度Bahdanau AttentionLuong Attention 提出时间2014 (EMNLP)2015 (ICML) 分数计算加性：v·tanh(W₁hₜ+W₂h̄ₛ)三种变体：dot/general/concat 对齐方式全局（所有编码器隐状态）全局或局部（窗口内） 解码器结构上一时刻状态s_{t-1}参与计算当前状态s_t参与计算 上下文使用与s_{t-1}拼接后预测s_t与s_t拼接后预测y_t 计算复杂度O(T·d²)（需计算W矩阵）O(T·d)（简单时） 分数计算细节： Bahdanau：score(s_{t-1}, h_i) = v_aᵀ · tanh(W_a·[s_{t-1}; h_i]) Luong dot：score(s_t, h_i) = s_tᵀ · h_i Luong general：score(s_t, h_i) = s_tᵀ · W_a · h_i Luong concat：score(s_t, h_i) = v_aᵀ · tanh(W_a·[s_t; h_i]) 实际效果： Luong的dot attention计算最快（无参数...

维度	Bahdanau Attention	Luong Attention
提出时间	2014 (EMNLP)	2015 (ICML)
分数计算	加性：v·tanh(W₁hₜ+W₂h̄ₛ)	三种变体：dot/general/concat
对齐方式	全局（所有编码器隐状态）	全局或局部（窗口内）
解码器结构	上一时刻状态s_{t-1}参与计算	当前状态s_t参与计算
上下文使用	与s_{t-1}拼接后预测s_t	与s_t拼接后预测y_t
计算复杂度	O(T·d²)（需计算W矩阵）	O(T·d)（简单时）

Bahdanau Attention和Luong Attention的区别

回答

编译有声