算法1:随机线性二次最优控制的Q-learning算法

1) 初始化矩阵使 Q 2 ( 0 ) = I d Q 1 ( 0 ) = [ 1 , , 1 ] 1 × d

2) for t = 0 , T

3) Q 2 ( t + 1 ) = Q 2 ( t ) + α [ E ( N + γ Λ T Q 2 ( t ) Λ ) Q ( t ) ]

4) Q 1 ( t + 1 ) = Q 1 ( t ) + α [ E ( γ Γ 1 ( Q 2 ( t ) , Q 1 ( t ) ) Λ ) Q 1 ( t ) ]

5) K 0 ( t ) = 1 1 γ Γ 0 ( Q 2 ( t ) , Q 1 ( t ) )

6) t = t + 1

end for