将参数初始化:

m t = 0 n t = 0

计算t时刻的梯度:

g t = θ f t ( θ t 1 )

更新参数:

m t = β 1 m t 1 + ( 1 β 1 ) g t (momentum项)

n t = β 2 n t 1 + ( 1 β 2 ) g t 2 (RMSprop项)

计算修正偏差:

m ^ t = m t / ( 1 β 1 t )

n ^ t = n t / ( 1 β 2 t )

更新权重:

θ t + 1 = θ t α m ^ t / ( n ^ t + ε )