內容簡介
《非綫性係統自學習最優控製:自適應動態規劃方法(英文版)》presents a class of novel, self-learning, optimal control schemes based on adaptive dynamic programming techniques, which quantitatively obtain the optimal control schemes of the systems. It analyzes the properties identified by the programming methods, including the convergence of the iterative value functions and the stability of the system under iterative control laws, helping to guarantee the effectiveness of the methods developed. When the system model is known, self-learning optimal control is designed on the basis of the system model; when the system model is not known, adaptive dynamic programming is implemented according to the system data, effectively making the performance of the system converge to the optimum.
With various real-world examples to complement and substantiate the mathematical analysis, the book is a valuable guide for engineers, researchers, and students in control science and engineering.
內頁插圖
目錄
Contents
1 Principle of Adaptive Dynamic Programming 1
1.1 Dynamic Programming 1
1.1.1 Discrete-Time Systems 1
1.1.2 Continuous-Time Systems 2
1.2 Original Forms of Adaptive Dynamic Programming 3
1.2.1 Principle of Adaptive Dynamic Programming 4
1.3 Iterative Forms of Adaptive Dynamic Programming 9
1.3.1 Value Iteration 9
1.3.2 Policy Iteration 10
1.4 About This Book 11
References 14
2 An Iterative *-Optimal Control Scheme for a Class of Discrete-Time Nonlinear Systems with Unfixed Initial State 19
2.1 Introduction 19
2.2 Problem Statement 20
2.3 Properties of the Iterative Adaptive Dynamic Programming Algorithm 21
2.3.1 Derivation of the Iterative ADP Algorithm 21
2.3.2 Properties of the Iterative ADP Algorithm 23
2.4 The *-Optimal Control Algorithm 28
2.4.1 The Derivation of the *-Optimal Control Algorithm 28
2.4.2 Properties of the *-Optimal Control Algorithm 32
2.4.3 The *-Optimal Control Algorithm for Unfixed Initial State 34
2.4.4 The Expressions of the *-Optimal Control Algorithm 37
2.5 Neural Network Implementation for the *-Optimal Control Scheme 37
2.5.1 The Critic Network 38
2.5.2 The Action Network 39
2.6 Simulation Study 40
2.7 Conclusions 42
References 43
3 Discrete-Time Optimal Control of Nonlinear Systems via Value Iteration-Based Q-Learning 47
3.1 Introduction 47
3.2 Preliminaries and Assumptions 49
3.2.1 Problem Formulations 49
3.2.2 Derivation of the Discrete-Time Q-Learning Algorithm 50
3.3 Properties of the Discrete-Time Q-Learning Algorithm 52
3.3.1 Non-Discount Case 52
3.3.2 Discount Case 59
3.4 Neural Network Implementation for the Discrete-Time Q-Learning Algorithm 64
3.4.1 The Action Network 65
3.4.2 The Critic Network 67
3.4.3 Training Phase 69
3.5 Simulation Study 70
3.5.1 Example 1 70
3.5.2 Example 2 76
3.6 Conclusion 81
References 82
4 A Novel Policy Iteration-Based Deterministic Q-Learning for Discrete-Time Nonlinear Systems 85
4.1 Introduction 85
4.2 Problem Formulation 86
4.3 Policy Iteration-Based Deterministic Q-Learning Algorithm for Discrete-Time Nonlinear Systems 87
4.3.1 Derivation of the Policy Iteration-Based Deterministic Q-Learning Algorithm 87
4.3.2 Properties of the Policy Iteration-Based Deterministic Q-Learning Algorithm 89
4.4 Neural Network Implementation for the Policy Iteration-Based Deterministic Q-Learning Algorithm 93
4.4.1 The Critic Network 93
4.4.2 The Action Network 95
4.4.3 Summary of the Policy Iteration-Based Deterministic Q-Learning Algorithm 96
4.5 Simulation Study 97
4.5.1 Example 1 97
4.5.2 Example 2 100
4.6 Conclusion 107
References 107
5 Nonlinear Neuro-Optimal Tracking Control via Stable Iterative Q-Learning AIgorithm 111
5.1 lntroduction 111
5.2 Problem Statement 112
5.3 Policy Iteration Q-Leaming Algotithm for Optimal Tracking Control 114
5.4 Properties of the Policy Iteration Q-Learning Algorithm 114
5.5 Neural Network Implementation for the Policy Iteration Q-Leaming Algorithm 119
5.5.1 The Critic Network 120
5.5.2 The Action Network 120
5.6 Simulation Study 121
5.6.1 Example 1 122
5.6.2 Example 2 125
5.7 Conclusions 129
References 129
6 Model-Free Multiobjective Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems with General Performance Index Functions 133
6.1 Introduction 133
6.2 Preliminaries 134
6.3 Multiobjective Adaptive Dynamic Programming Method 135
6.4 Model-Free Incremental Q-Leaming Method 145
6.5 Neural Network Implementation for the Incremental Q-Learning Method 147
6.5.1 The Critic Network 148
6.5.2 The Action Network 149
6.5.3 The Procedure of the Model-Free Incremental Q-Iearning Method 150
6.6 Convergence Proof 150
6.7 Simulation Study 153
6.7.1 Example 1 153
6.7.2 Example 2 155
6.8 Conclusion 157
References 157
7 Multiobjective Optimal Control for a Class of Unknown Nonlinear Systems Based on Finite-Approximation-Error ADP Algorithm 159
7.1 Introduction 159
7.2 General Formulation 160
7.3 Optimal Solution Based on Finite-Approximation-Error ADP 162
7.3.1 Data-Based Identifier of Unknown System Dynamics 162
7.3.2 Derivation of the ADP Algorithm with Finite Approximation Errors 166
7.3.3 Convergence Analysis of the Iterative ADP Algorithm 168
7.4 Implementation of the Iterative ADP Algorithm 173
7.4.1 Critic Network 174
7.4.2 The Action Network 174
7.4.3 The Procedure of the ADP Algorithm 175
7.5 Simulation Study 175
7.5.1 Example 1 176
7.5.2 Example 2 179
7.6 Conclusions 182
References 182
8 A New Approach for a Class of Continuous-Time Chaotic Systems Optimal Control by Online ADP Algorithm 185
8.1 Introduction 185
8.2 Problem Statement 185
8.3 Optimal Control Based on Online ADP Algorithm 187
8.3.1 Design Method of the Critic Network and the Action Network 188
8.3.2 Stability Anal
智能控製與優化前沿:麵嚮復雜動態係統的自適應決策新範式 本書聚焦於現代控製理論、優化算法與計算智能的交叉領域,深入探討瞭在不確定性和非綫性環境下,如何設計齣具備自主學習和全局優化能力的控製係統。 當前,工程實踐正麵臨著前一類傳統模型驅動控製方法難以有效應對的挑戰:係統模型的獲取極其睏難或成本高昂;係統動態特性隨時間發生不可預測的變化;以及控製目標本身是依賴於實時性能評估的復雜優化問題。本書旨在構建一個超越傳統反饋綫性化和精確模型補償的理論框架,重點闡述基於強化學習(Reinforcement Learning, RL)的自適應優化方法在解決高維、強耦閤、非綫性控製問題中的核心原理、算法實現與工程應用。 全書內容主要圍繞以下幾個相互關聯且具有前瞻性的主題展開: 第一部分:非綫性係統分析與優化基礎重構 本部分為後續高級算法奠定堅實的數學和理論基礎,側重於從傳統控製理論視角審視強化學習的潛力,並為復雜係統的穩定性與收斂性提供理論保障。 1. 復雜非綫性係統建模的局限性與新型描述: 詳細分析瞭經典係統描述(如狀態空間模型、傳遞函數)在處理高自由度機器人、復雜化學反應器或電網等場景時的固有瓶頸。重點探討瞭如何利用高階張量錶達和動態流形理論對係統內在的非光滑、不完整可觀測性進行抽象描述。 2. 性能指標的動態優化視角: 重新審視最優控製中的性能指標(如JSP,最優哈密頓-雅可比方程)。本書強調,在模型未知或時變的情況下,性能指標本身必須是可學習和可修正的。討論瞭基於觀測器的性能估計,以及如何將性能指標轉化為一個期望的反饋增益空間,而非固定的控製律。 3. 穩定性分析的非傳統方法: 鑒於自適應控製器的在綫調整特性,傳統的李雅普諾夫穩定性理論在直接應用時存在睏難。本部分深入介紹瞭區間分析(Interval Analysis)和保留能量函數(Invariant Energy Function)在評估帶學習組件的閉環係統穩定性中的應用,為算法的魯棒性提供定性分析工具。 第二部分:基於價值與策略的自學習框架 本部分是全書的核心,詳細闡述瞭如何利用強化學習的理念來構建一個能夠自主發現最優控製策略的算法體係。 4. 策略梯度與價值函數的解析分解: 超越標準的Actor-Critic框架,本書剖析瞭策略梯度(如REINFORCE, PPO)背後的概率論基礎。重點討論瞭如何針對連續控製域,采用高斯過程(Gaussian Processes, GP)或核方法對價值函數進行非參數估計,以有效處理高斯噪聲和稀疏采樣問題。 5. 基於模型的自適應規劃(Model-Based Adaptive Planning): 強調瞭構建“弱模型”或“局部模型”的重要性。不同於完全依賴數據的Model-Free方法,本節闡述瞭如何利用有限的係統交互數據,通過局部綫性化(Local Linearization)或稀疏係統辨識(Sparse System Identification)快速獲得局部模型,並將這些局部模型集成到規劃步驟中,顯著提升學習效率和對異常輸入的抵抗力。 6. 探索(Exploration)的有效性度量與策略: 在復雜的控製環境中,有效的探索是收斂到全局最優的關鍵。本書提齣瞭信息增益驅動的探索策略,不再依賴於簡單的$epsilon$-貪婪或隨機擾動。探討瞭如何量化當前策略對係統狀態空間覆蓋的程度,並利用不確定性傳播(Uncertainty Propagation)來指導決策,優先選擇能最大化未來信息量的動作。 第三部分:麵嚮工程實現的魯棒性與計算效率 本部分將理論算法轉化為可部署的工程解決方案,重點關注算法的實時性、對環境乾擾的抵抗力以及實際數據處理能力。 7. 異構數據流下的在綫學習與重校準: 討論瞭在實際工業物聯網(IIoT)環境中,控製信號、傳感器讀數和性能反饋可能存在不同采樣率和延遲。提齣瞭時間序列捲積網絡(TS-CNN)作為前置處理器,用於對異構數據進行同步和特徵提取,確保學習過程的同步性。 8. 約束處理與安全關鍵控製的集成: 實際係統(如無人機、化工過程)必須遵守硬性約束(如物理邊界、安全閾值)。本書探討瞭投影梯度下降(Projected Gradient Descent)在策略更新階段的應用,以及如何利用障礙函數(Barrier Functions)與價值函數近似相結閤,構建齣在學習過程中始終滿足狀態和輸入約束的控製律。 9. 學習算法的並行化與邊緣計算部署: 針對現代控製係統的高頻要求,本節關注如何優化算法的計算復雜度。介紹瞭基於GPU的張量運算加速,以及如何將策略評估和模型更新解耦,實現異步並行訓練。此外,還討論瞭如何對訓練好的輕量化策略網絡進行模型剪枝和量化,以適應資源受限的邊緣控製器。 本書的特色在於,它不局限於任何單一的深度學習架構,而是將控製論的嚴謹性、優化理論的全局視野與現代計算智能的自適應能力進行深度融閤,為設計下一代自主、高效、可靠的復雜動態係統控製方案,提供瞭一條清晰且可驗證的理論與技術路徑。