当前位置：文档之家› 基于近似动态规划算法研究

基于近似动态规划算法研究

J *[x(t),t] min(U[x(t),u(t),t] J *[x(t 1),t 1]) u (t )
u*(t) arg min(U[x(t),u(t),t] J *[x(t 1),t 1]) u (t )
动态规划的缺点：
维数灾问题 (curse of dimensionality)
解决办法：使用诸如人工神经网络一类的结构来近似表达目标函数进而得到动态规划问题的近似解，即近似动态规划（Adaptive Critic Design, ACD）。
2. Theory of Neural Network
神经元结构（neuron model）
f-激活函数 • 阈值型(Hard limit) • 线性型(Linear) • S型(Log-sigmoid)
线性系统HJB方程的解) ➢ 5.Neural Network Modeling(神经网络建模)
1. Introduction
动态规划及贝尔曼最优性原理
Dynamic programming and Bellman’s principle of optimality
系统描述
系统性能指标 J[x(i), i] kiU[x(k ), u(k ), k ] k i U[x(t),u(t),t] J *[x(t 1),t 1]
Vi+1=x(t)TQx(t)+uT(x(t))Ru(x(t))+Vi(f(x(t))+g(x(t))ui(x(t)) =x(t)TQx(t)+uiT(x(t))Rui(x(t))+Vi(x(t+1))
i=i+1 no
|Vi+1-Vi|<ε yes
3. Adaptive Critic Design
➢ HDP(Heuristic dynamic programming): ➢ DHP(Dual heuristic dynamic programming): ➢ GDHP(Globalized dual heuristic dynamic programming)
输出 a f (n)
Inputs Multiple-input Neuron ouputs
p1
w1,1
p2
∑n f a
••
••
••
b
pR
w1,R 1
a=f(Wp+b)
其中 n w1,1 p1 w1,2 p2 w1,R pR b
神经网络模型(Network architectures)
w1i,j
ktU (k) k t
Ĵ(t+1) Critic Network
x(t+1) Model Network
u(t) Action Network
x(t)
4 Discrete Time Nonlinear HJB Solution
离散系统HJB的解
系统方程 x(t 1) f (x(t)) g(x(t))u(x(t))
4 x(t 1)
x(t 1)
HDP迭代算法
Start
பைடு நூலகம்Initialization V0=0
Solving the minimizing problem ui(x)=min(x(t)TQx(t)+uT(x(t))Ru(x(t))+Vi(x(t+1)))
Updating the value function
基于近似动态规划的算法研究
Research on an iterative algorithm for approximate optimal control based on
adaptive critic design
姓名：曹宁导师：张化光教授
本文主要内容
➢ 1.Introduction(引言) ➢ 2.Theory of Neural Network (神经网络理论) ➢ 3.Adaptive Critic Design(近似动态规划原理) ➢ 4.Discrete Time Nonlinear HJB Solution(离散非
u(t) Action Network
x(t)
u(t) Action Network
x(t)
HDP评论网的训练
J (t) ktU[x(k),u(k), k] k t
Eh Eh (t)
t
1 [Jˆ(t) U (t) Jˆ(t 1)]2 2t
Jˆ(t) U (t) Jˆ(t 1) U (t) [U (t 1) Jˆ(t 2)]
目标函数 V (x(t)) x(i)T Qx(i) u(i)T Ru(i) it
V *(x(t)) min(x(t)T Qx(t) u(t)T Ru(t) V *(x(t 1))) u (t )
u*(x(t)) 1 R1g(x(t))T V *(x(t 1))
2
x(t 1)
V *(x(t)) x(t)T Qx(t) 1 V *(x(t 1))T g(x(t))R1g(x(t))T V *(x(t 1)) V *(x(t 1))
w2j,t
p1
a1
p2
i
j a1j
t
a2
a2t
pR
aS
Input layer
Output layer
Hidden layers
a=f2(W2f1(W1p+b1)+b2)
误差反传算法
1. 正向传播
2. 误差反向传播
计算
sM 2F M (nM )(t a)
sm F m (nm )(W m1)T sm1
其中
F
m
(nm
)
f
m
(n1m 0
)
0
0 f m (n2m )
0
0
0
f
m (nsmm
)
sM sM 1 s2 s1
权值及偏置更新
W m (k 1) W m (k) sm (am1)T
bm (k 1) bm (k) sm
BP算法的变形
➢批处理(Batching) ➢动量BP算法(MOBP) ➢可变学习速度的BP算法(VLBP) ➢共轭梯度法(CGBP) ➢LM BP 算法(LMBP)
➢ AD (action dependent) forms of HDP, DHP,GDHP
HDP和ADHDP
Ĵ(t+1) Critic Network
x(t+1) Model Network
Q(t) Critic Network New critixc(nt+et1w) ork Model Network

e商务文档

基于近似动态规划算法研究

相关文档推荐：