Artifical Developmental Learning: Ernest 5.0

If we want Ernest to exhibit a more adventurous behavior, we need him to prefer actions that make him move to actions that keep him at the same place. So to do, we need him to have some sense of movement.

So, Ernest 5.0 can now perceive three things after each primary action: "bumped", "moved", or "turned". Each of these primary perceptions has a satisfaction value: val(bumped) = -1, val(moved) = 1, val(turned) = 0.

Schemas are also associated with satisfaction values when they are constructed. The satisfaction value of a primary schema is equal to the satisfaction value of its expected primary perception. The satisfaction value of a secondary schema is the sum of the satisfaction values of its subschemas.

The schema preferences are now computed as pref = weight * value. weight is the number of times this schema has been successfully enacted and value is the satisfaction value of this schema. As before, all schemas compete and the one with the highest preference is selected.

Although this approach might seem similar to classical "reward mechanism" approaches, it is actually different. Classical approaches consist of giving a reward for reaching a high-level goal, and having the agent backward-propagate this reward to prior operations that led to this goal (Laird & Congdon, 2008). In contrary, satisfaction values are defined at the lowest level, and Ernest constructs higher-level goals that let him better fulfill his lower-level satisfaction. Because of this difference, I cannot use the Soar 9 built-in reward mechanism. I believe that my approach better accounts for natural cognitive agents way of doing.

In this video, we can see that Ernest 5.0 finds a solution based on secondary schema S52 that allows him to keep moving without bumping into walls. When observing his activity, we can infer, if we want, that he does not "like" bumping into walls but that he "likes" moving. Some may even detect some "impatience" of discovering a wider universe!

References

Laird John E., Congdon Claire B., (2008). Part VII-Reinforcement Learning. The Soar User’s Manual Version 9.0. University of Michigan.

Artifical Developmental Learning

Tuesday, February 24, 2009

Ernest 5.0

No comments:

Archives

Artifical Developmental Learning

Tuesday, February 24, 2009

Ernest 5.0

No comments:

Archives

RSS feed