Olivier Georgeon's research blog—also known as the story of little Ernest, the developmental agent.

Keywords: situated cognition, constructivist learning, intrinsic motivation, bottom-up self-programming, individuation, theory of enaction, developmental learning, artificial sense-making, biologically inspired cognitive architectures, agnostic agents (without ontological assumptions about the environment).

Tuesday, February 24, 2009

Ernest 5.0

If we want Ernest to exhibit a more adventurous behavior, we need him to prefer actions that make him move to actions that keep him at the same place. So to do, we need him to have some sense of movement.

So, Ernest 5.0 can now perceive three things after each primary action: "bumped", "moved", or "turned". Each of these primary perceptions has a satisfaction value: val(bumped) = -1, val(moved) = 1, val(turned) = 0.

Schemas are also associated with satisfaction values when they are constructed. The satisfaction value of a primary schema is equal to the satisfaction value of its expected primary perception. The satisfaction value of a secondary schema is the sum of the satisfaction values of its subschemas.

The schema preferences are now computed as pref = weight * value. weight is the number of times this schema has been successfully enacted and value is the satisfaction value of this schema. As before, all schemas compete and the one with the highest preference is selected.

Although this approach might seem similar to classical "reward mechanism" approaches, it is actually different. Classical approaches consist of giving a reward for reaching a high-level goal, and having the agent backward-propagate this reward to prior operations that led to this goal (Laird & Congdon, 2008). In contrary, satisfaction values are defined at the lowest level, and Ernest constructs higher-level goals that let him better fulfill his lower-level satisfaction. Because of this difference, I cannot use the Soar 9 built-in reward mechanism. I believe that my approach better accounts for natural cognitive agents way of doing.

In this video, we can see that Ernest 5.0 finds a solution based on secondary schema S52 that allows him to keep moving without bumping into walls. When observing his activity, we can infer, if we want, that he does not "like" bumping into walls but that he "likes" moving. Some may even detect some "impatience" of discovering a wider universe!

References

Laird John E., Congdon Claire B., (2008). Part VII-Reinforcement Learning. The Soar User’s Manual Version 9.0. University of Michigan.

No comments: