1 / 81

Lecture VI: Adaptive Systems

Lecture VI: Adaptive Systems. Zhixin Liu Complex Systems Research Center, Academy of Mathematics and Systems Sciences, CAS. In the last lecture, we talked about. Game Theory An embodiment of the complex interactions among individuals Nash equilibrium Evolutionarily stable strategy.

Télécharger la présentation

Lecture VI: Adaptive Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture VI: Adaptive Systems Zhixin Liu Complex Systems Research Center, Academy of Mathematics and Systems Sciences, CAS

  2. In the last lecture, we talked about Game Theory An embodiment of the complex interactions among individuals • Nash equilibrium • Evolutionarily stable strategy

  3. In this lecture, we will talk about Adaptive Systems

  4. Adaptation • To adapt: to change oneself to conform to a new or changed circumstance. • What we know from the new circumstance? Adaptive estimation, learning, identification • How to do the corresponding response? Control/decision making

  5. Why Adaptation? Uncertainties always exist in modeling of practical systems. Adaptation can reduce the uncertainties by using the system information. Adaptation is an important embodiment of human intelligence.

  6. system control system system control control system system control control Framework of Adaptive Systems Environment

  7. Two levels of adaptation • Individual: learn and adapt • Population level • Death of old individuals • Creation of new individuals • Hierarchy

  8. Some Examples • Adaptive control systems adaptation in a single agent • Iterated prisoner’s dilemma adaptation among agents

  9. Some Examples • Adaptive control systems adaptation in a single agent • Iterated prisoner’s dilemma adaptation among agents

  10. wt yt ut system control Adaptation In A Single Agent Environment

  11. Information wt yt ut Information= prior+ posterior =I0+I1 Dynamical System I0 = prior knowledge about the system I1 = posterior knowledge about the system ={u0,u1,…ut, y0,y1,…,yt} (Observations) The posterior information can be used to reduce the uncertainties of the system.

  12. Uncertainty Model External Uncertainty Internal Uncertainty • External uncertainty: noise/disturbance • Internal uncertainty: • Parameter uncertainty • Signal uncertainty • Functional uncertainty

  13. Adaptation • To adapt: to change oneself to conform to a new or changed circumstance. • What we know from the new circumstance? Adaptive estimation, learning, identification • How to do the corresponding response? Control/decision making

  14. Adaptive Estimation

  15. Adaptive Estimation • Adaptive estimation: parameter or structure estimator, which can be updated based on the on-line observations. ŷt Adaptive Estimator e - + ∑ yt ut System • Example: In the parametric case, the parameter estimator can be obtained by minimizing certain prediction error:

  16. Adaptive Estimation Parameter estimation Consider the following linear regression model: : unknown parameter vector : regression vector : noise sequence Remark • Linear regression model may be nonlinear. • Linear system can be translated into linear regression model.

  17. Least Square (LS) Algorithm • 1795, Gauss, least square algorithm • The number of functions is greater than that of the unknown parameters. • The data contain noise. • Minimize the following prediction error:

  18. Recursive Form of LS Recursive Form of LS: where Pt is the following estimation “covariance” matrix A basic problem:

  19. Assumption 1:1) The noise sequence is a martingale difference sequence, and there exists a constant , such that 2) The regression vector is an adaptive sequence, i.e., Recursive Form of LS Theorem (T.L. Lai & C.Z. Wei) Under the above assumption, if the following condition holds then the LS has the strong consistency.

  20. Weighted Least Square • Minimize the following prediction error: • Recursive form of WLS:

  21. Self-Convergence of WLS Take the weight as follows: with . TheoremUnder Assumption 1, for any initial value and any regression vector , will converge to some vector almost surely. Lei Guo, 1996, IEEE TAC

  22. Adaptation • To adapt: to change oneself to conform to a new or changed circumstance. • What we know from the new circumstance? Adaptive estimation, learning, identification • How to do the corresponding response? Control/decision making

  23. Adaptive Control

  24. Adaptive Control Adaptive Control: a controller with adjustable parameters (or structures) together with a mechanism for adjusting them. y u Adaptive Estimator Plant r Adaptive Controller r

  25. Robust Control Model = Nominal +”Ball” r Can not reduce uncertainty!

  26. Adaptive Control An example Consider the following linear regression model: Where a and b are unknown parameters, yt , ut, and wt are the output, input and white noise sequence. Objective: design a control law to minimize the following average tracking errors

  27. Adaptive Control If (a,b) is known, we can get the optimal controller: “Certainty Equivalence” Principle: Replace the unknown parameters in a non-adaptive controller by its online estimate If (a,b) is unknown, the adaptive controller can be taken as

  28. with Adaptive control If (a,b) is unknown, the adaptive controller can be taken as where (at,bt) can be obtained by LS:

  29. Adaptive Control The closed-loop system:

  30. Theoretical Problems a) Stability: b) Optimality:

  31. Closed-loop system Estimation Data Theoretical Obstacles Controller

  32. Theoretical Obstacles 1) The closed-loop system is a very complicated nonlinear stochastic dynamical system. 2) No useful statistical properties, like stationarity or independency of the system signals are available. 3) No properties of (at,bt) are known a priori.

  33. Theorem Assumption:1) The noise sequence is a martingale difference sequence, and there exists a constant , such that Theorem Under the above assumptions, the closed-loop system is stable and optimal. 2) The regression vector is an adaptive sequence, i.e., 3) is a deterministic bounded signal. Lei Guo, Automatica, 1995

  34. Some Examples • Adaptive control systems adaptation in a single agent • Iterated prisoner’s dilemma adaptation among agents

  35. Prisoner’s Dilemma • The story of prisoner’s dilemma Player: two prisoners Action: {cooperation, Defect} Payoff matrix Prisoner B C D (0,5) (3,3) C Prisoner A (1,1) (5,0) D

  36. Prisoner’s Dilemma • No matter what the other does, the best choice is “D”. • (D,D) is a Nash Equilibrium. • But, if both choose “D”, both will do worse than if both select “C” Prisoner B C D (0,5) (3,3) C Prisoner A (1,1) (5,0) D

  37. Iterated Prisoner’s Dilemma • The individuals: • Meet many times • Can recognize a previous interactant • Remember the prior outcome • Strategy: specify the probability of cooperation and defect based on the history • P(C)=f1(History) • P(D)=f2(History)

  38. Strategies • Tit For Tat – cooperating on the first time, then repeat opponent's last choice. Player A C D D C C C C C D D D D C… Player B D D C C C C C D D D D C… • Tit For Tat and Random - Repeat opponent's last choice skewed by random setting.* • Tit For Two Tats and Random - Like Tit For Tat except that opponent must make the same choice twice in a row before it is reciprocated. Choice is skewed by random setting.* • Tit For Two Tats - Like Tit For Tat except that opponent must make the same choice twice in row before it is reciprocated. • Naive Prober (Tit For Tat with Random Defection) - Repeat opponent's last choice (ie Tit For Tat), but sometimes probe by defecting in lieu of cooperating.* • Remorseful Prober (Tit For Tat with Random Defection) - Repeat opponent's last choice (ie Tit For Tat), but sometimes probe by defecting in lieu of cooperating. If the opponent defects in response to probing, show remorse by cooperating once.* • Naive Peace Maker (Tit For Tat with Random Co-operation) - Repeat opponent's last choice (ie Tit For Tat), but sometimes make peace by co-operating in lieu of defecting.* • True Peace Maker (hybrid of Tit For Tat and Tit For Two Tats with Random Cooperation) - Cooperate unless opponent defects twice in a row, then defect once, but sometimes make peace by cooperating in lieu of defecting.* • Random - always set at 50% probability

  39. Strategies • Tit For Tat – cooperating on the first time, then repeat opponent's last choice. Player A C D D C C C C C D D D D C… Player B D D C C C C C D D D D C… • Tit For Tat and Random - Repeat opponent's last choice skewed by random setting.* • Tit For Two Tats and Random - Like Tit For Tat except that opponent must make the same choice twice in a row before it is reciprocated. Choice is skewed by random setting.* • Tit For Two Tats- Like Tit For Tat except that opponent must make the same choice twice in row before it is reciprocated. • Naive Prober (Tit For Tat with Random Defection) - Repeat opponent's last choice (ie Tit For Tat), but sometimes probe by defecting in lieu of cooperating.* • Remorseful Prober (Tit For Tat with Random Defection) - Repeat opponent's last choice (ie Tit For Tat), but sometimes probe by defecting in lieu of cooperating. If the opponent defects in response to probing, show remorse by cooperating once.* • Naive Peace Maker (Tit For Tat with Random Co-operation) - Repeat opponent's last choice (ie Tit For Tat), but sometimes make peace by co-operating in lieu of defecting.* • True Peace Maker (hybrid of Tit For Tat and Tit For Two Tats with Random Cooperation) - Cooperate unless opponent defects twice in a row, then defect once, but sometimes make peace by cooperating in lieu of defecting.* • Random - always set at 50% probability

  40. Strategies • Always Defect • Always Cooperate • Grudger(Co-operate, but only be a sucker once) - Cooperate until the opponent defects. Then always defect unforgivingly. • Pavlov (repeat last choice if good outcome) - If 5 or 3 points scored in the last round then repeat last choice. • Pavlov / Random (repeat last choice if good outcome and Random) - If 5 or 3 points scored in the last round then repeat last choice, but sometimes make random choices.* • Adaptive - Starts with c,c,c,c,c,c,d,d,d,d,d and then takes choices which have given the best average score re-calculated after every move. • Gradual - Cooperates until the opponent defects, in such case defects the total number of times the opponent has defected during the game. Followed up by two co-operations. • Suspicious Tit For Tat - As for Tit For Tat except begins by defecting. • Soft Grudger - Cooperates until the opponent defects, in such case opponent is punished with d,d,d,d,c,c. • Customised strategy 1 - default setting is T=1, P=1, R=1, S=0, B=1, always co-operate unless sucker (ie 0 points scored). • Customised strategy 2 - default setting is T=1, P=1, R=0, S=0, B=0, always play alternating defect/cooperate.

  41. Iterated Prisoner’s Dilemma • Which strategy can thrive/what is the good strategy? • Robert Axelrod, 1980s • A computer round-robin tournament • The first round • The second round AXELROD R. 1987. The evolution of strategies in the iterated Prisoners' Dilemma. In L. Davis, editor, Genetic Algorithms and Simulated Annealing. Morgan Kaufmann, Los Altos, CA.

  42. Characters of “good” strategies • Goodness: never defect first • First round: the first eight strategies with “goodness” • Second round: fourteen strategies with “goodness” in the first fifteen strategies • Forgiveness: may revenge, but the memory is short. • “Grudger” is not s strategy with “forgiveness” • “Goodness” and “forgiveness” is a kind of collective behavior. • For a single agent, defect is the best strategy.

  43. Evolution of the Strategies • Evolve “good” strategies by genetic algorithm (GA)

  44. Some Notations in GA • String: the individuals, and it is used to represent the chromosome in genetics. • Population: the set of the individuals • Population size: the number of the individuals • Gene: the elements of the string E.g., S=1011, where 1,0,1,1 are called genes. • Fitness: the adaptation of the agent for the circumstance From Jing Han’s PPT

  45. How GA works? • Represent the solution of the problem by “chromosome”, i.e., the string • Generate some chromosomes as the initial solution randomly • According to the principle of “Survival of the Fittest”, the chromosome with high fitness can reproduce, then by crossover and mutation the new generation can be generated. • The chromosome with the highest fitness may be the solution of the problem. From Jing Han’s PPT

  46. GA Natural Selection Create new generation crossover • choose an initial population • determine the fitness of each individual • perform selection • repeat • perform crossover • perform mutation • determine the fitness of each individual • perform selection • until some stopping criterion applies mutation From Jing Han’s PPT

  47. Some Remarks On GA • The GA search the optimal solution from a set of solution, rather than a single solution • The search space is large: {0,1}L • GA is a random algorithm, since selection, crossover and mutation are all random operations. • Suitable for the following situation: • There is structure in the search space but it is not well-understood • The inputs are non-stationary (i.e., the environment is changing) • The goal is not global optimization, but finding a reasonably good solution quickly

  48. Evolution of Strategies By GA • Each chromosome represents one strategy • The strategy is deterministic and it is determined by the previous moves. • E.g., the strategy is determined by one step history, then there are four cases of history Player I C D D C Player II D D C C • The number of the possible strategies is 2*2*2*2=16. • TFT: F(CC)=C, F(CD)=D, F(DC)=C, F(DD)=D • Always cooperate: F(CC)=F(CD)=F(DC)=F(DD)=C • Always defect: F(CC)=F(CD)=F(DC)=F(DD)=D • …

  49. Evolution of the Strategies • Strategies: use the outcome of the three previous moves to determine the current move. • The possible number of the histories is 4*4*4=64. Player I CCC CCD CDC CDD DCC DCD … DDD DDD Player II CCC CCC CCC CCC CCC CCC … DDC DDD C C C C C C … C C C C C C C C … C D D D D D D D … D D • The initial premises is three hypothetical move. • The length of the chromosome is 70. • The total number of strategies is 270≈1021.

  50. Evolution of “good” strategy Five steps of evolving “good” strategies by GA • An initial population is chosen. • Each individual is run in the current environment to determine its effectiveness. • The relatively successful individual are selected to have more offspring. • The successful individuals are randomly paired off to produce two offspring per mating. • Crossover: way of constructing the chromosomes of the two offspring from the chromosome of two parents. • Mutation: randomly changing a very small proportion of the C’s to D’s and vice versa. • New population are generated.

More Related