将参数初始化:
m t = 0 , n t = 0
计算t时刻的梯度:
g t = ∇ θ f t ( θ t − 1 )
更新参数:
m t = β 1 m t − 1 + ( 1 − β 1 ) g t (momentum项)
n t = β 2 n t − 1 + ( 1 − β 2 ) g t 2 (RMSprop项)
计算修正偏差:
m ^ t = m t / ( 1 − β 1 t )
n ^ t = n t / ( 1 − β 2 t )
更新权重:
θ t + 1 = θ t − α ∗ m ^ t / ( n ^ t + ε )