算法1 AMSGrad

算法2MAXGrad

输入: x 1 F { β 1 t } t = 1 T β 2

{ α t } t = 1 T ϵ

输入: x 1 F { β 1 t } t = 1 T β 2

{ α t } t = 1 T ϵ

初始化: m 0 = 0 v 0 = 0 v ^ 0 = 0

初始化: m 0 = 0 v 0 = 0

1: for t = 1 to T do

1: for t = 1 to T do

2: g t = x f t ( x t )

2: g t = x f t ( x t )

3: m t = β 1 t m t 1 + ( 1 β 1 t ) g t

3: m t = β 1 t m t 1 + ( 1 β 1 t ) g t

4: v t = β 2 v t + ( 1 β 2 ) g t 2

5: v ^ t = max ( v ^ t 1 , v t )

and V t = diag ( v ^ t + ϵ )

4: v t = max ( v t 1 , β 2 v t 1 + ( 1 β 2 ) g t 2 )

and V t = diag ( v t + ϵ )

6: x t + 1 = Π F , V t ( x t α t V t 1 / 2 m t )

5: x t + 1 = Π F , V t ( x t α t V t 1 / 2 m t )

7: end for

6: end for