算法1:随机线性二次最优控制的Q-learning算法
1) 初始化矩阵使 Q 2 ( 0 ) = I d , Q 1 ( 0 ) = [ 1 , ⋯ , 1 ] 1 × d
2) for t = 0 , T :
3) Q 2 ( t + 1 ) = Q 2 ( t ) + α [ E ( N + γ Λ T Q 2 ( t ) Λ ) − Q ( t ) ]
4) Q 1 ( t + 1 ) = Q 1 ( t ) + α [ E ( γ Γ 1 ( Q 2 ( t ) , Q 1 ( t ) ) Λ ) − Q 1 ( t ) ]
5) K 0 ( t ) = 1 1 − γ Γ 0 ( Q 2 ( t ) , Q 1 ( t ) )
6) t = t + 1
end for