NEW TIES WP2 Agent and learning mechanisms

NEW TIES WP2 Agent and learning mechanisms

Decision making and learning • Agents have a controller (decision tree, DQT) • Input: situation (as perceived = seen/heard/interpr’d • Output: action • Decision making = using DQT • Learning = modifying DQT • Decisions also depend on inheritable “attitude genes” (learned through evolution)

B 0.5 0.5 VISUAL: FRONT FOOD REACHABLE BAG: FOOD T T NO YES YES NO A A A 1.0 0.6 0.2 0.2 EAT MOVE TURN LEFT TURN RIGHT A 1.0 0.6 0.2 0.2 PICKUP MOVE TURN LEFT TURN RIGHT Legend B Bias T Test A Action Decision Genetic bias Boolean choice 0.2 YES Example of a DQT

Interaction evolution & individual learning • Bias node with n children each with bias bi • Bias ≠ probability • Bias bi is learned, changing (name: learned bias) • Genetic bias gi is inherited, part of genome, constant • Actual probability of choosing child x: p(b,g) = b + (1 - b) ∙ g • Learned and inherited behaviour are linked through formula

DQT nodes & parameters cont’d • Test node language: native concepts + emerging concepts • Native: see_agent, see_mother, see_food, have_food, see_mate, … • New concepts can emerge by categorisation (discrimination game)

Learning: the heart of the emergence engine • Evolutionary learning: • not within an agent (not during lifetime), over generations • by variation + selection • Individual learning: • within one agent, during lifetime • by reinforcement learning • Social learning: • during lifetime, in interacting agents • by sending/receiving + adopting knowledge pieces

Types of learning: properties • Evolutionary learning: • Agent does not create new knowledge during lifetime • Basic DQTree + genetic biases are inheritable • “knowledge creator” = crossover and mutation • Individual learning: • Agent does create new knowledge during lifetime • DQTree + learned biases are modified • “knowledge creator” = reinforcement learning (driven by rewards) • Individually learnt knowledge dies with its host agent • Social learning: • Agent imports knowledge already created elsewhere (new? not new?) • Adoption of imported knowledge ≈ crossover • Importing knowledge pieces • can save effort for recipient • can create novel combinations • Exporting knowledge helps its preservation after death of host

Present status of types of learning • Evolutionary learning: • Demonstrated in 2 NT scenarios • Autonomous selection/reproduction causes problems with population stability (im/explosion) • Individual learning: •  code, but never demonstrated in NT scenarios • Social learning: • Under construction/design based on the “telepathy” approach • Communication protocols + adoption mechanisms needed

Evolution: variation operators • Operators for DQT: • Crossover = subtree swap • Mutation = • Substitute subtree with random sub-tree • Change concepts in test nodes • Change bias on an edge • Operators for attitude genes: • Crossover = full arithmetic xover • Mutation = • Add Gaussian noise • Replace with random value

Evolution: selection operators • Mate selection: • Mate action chosen by DQT • Propose – accept proposal • Adulthood OK • Survivor selection: • Dead if too old ( ≥ 80 years) • Dead if zero energy

Experiment: Simple worldSetup: Environment • World size: 200 x 200 grid cells • Agents and food (no tokens, roads, etc). Both are variable in number. • Initial distribution of agents (500): in upper left corner • Initial distribution of food (10000): 5000 in upper left and lower right corner.

Experiment: Simple worldSetup: Agents • Native knowledge (concepts and DQT sub trees) • Navigating (random walk) • Eating (identify, pickup and eat plants) • Mating (identify mates, propose/agree) • Random DQT-tree branches • Differs per agent • Based on the “pool” of native concepts

Experiment: Simple world Simulation continued for 3 months real time to test stability

Experiment: Poisonous FoodSetup: Environment • Two types of food: poisonous (decreases energy) and edible (increases energy) • World size: 200 x 200 grid cells • Agents and food (no tokens, roads, etc). Both are variable in number. • Initial distribution of agents (500): uniform random over the grid space. • Initial distribution of food (10000): 5000 of each type of food uniform random over the same grid space as the agents.

Experiment: Poisonous FoodSetup: Agent • Native knowledge • Identical to simple world experiment • Additional native knowledge • Can distinguish poisonous from edible plants • Relation with eating/picking up is not present • No random DQT-tree branches

Experiment: Poisonous FoodMeasures • Population size • Welfare (energy) • Number of poisonous and edible plants • Complexity of controller (nr. of nodes) • Age

Experiment: Poisonous FoodDemo

Experiment: Poisonous Food Results

NEW TIES WP2 Agent and learning mechanisms

NEW TIES WP2 Agent and learning mechanisms

Presentation Transcript

new agent orientation

Best Mens Ties|Wedding Ties|Silk Neck Ties|Ascots|Bow Ties

WP2

NEW AGENT ORIENTATION

NEW TIES year 2 review

WP2

CARBON and HYDROLOGY (WP2)

WP2 – communication and dissemination

WP2 – Engagement and Dissemination

Statistical Language Learning: Mechanisms and Constraints

WP2 -Standardization and Interoperability

WP2

Enhanced Social Learning via Trust and Reputation Mechanisms in Multi-agent Systems

Molecular Mechanisms of Learning and Memory

New Agent Seminar

Mechanisms and Counter Mechanisms

Neural Mechanisms of Emotion and Emotional Learning.

Handmade Custom Bow Ties New York

WP2 – Engagement and Dissemination

Learning to Rank with Ties

Mechanisms of Learning and Acculturation:

Mechanisms of Learning and Acculturation: