If we want Ernest to exhibit a more adventurous behavior, we need him to prefer actions that make him move to actions that keep him at the same place. So to do, we need him to have some sense of movement.
So, Ernest 5.0 can now perceive three things after each primary action: "bumped", "moved", or "turned". Each of these primary perceptions has a satisfaction value: val(bumped) = -1, val(moved) = 1, val(turned) = 0.
Schemas are also associated with satisfaction values when they are constructed. The satisfaction value of a primary schema is equal to the satisfaction value of its expected primary perception. The satisfaction value of a secondary schema is the sum of the satisfaction values of its subschemas.
The schema preferences are now computed as pref = weight * value. weight is the number of times this schema has been successfully enacted and value is the satisfaction value of this schema. As before, all schemas compete and the one with the highest preference is selected.
Although this approach might seem similar to classical "reward mechanism" approaches, it is actually different. Classical approaches consist of giving a reward for reaching a high-level goal, and having the agent backward-propagate this reward to prior operations that led to this goal (Laird & Congdon, 2008). In contrary, satisfaction values are defined at the lowest level, and Ernest constructs higher-level goals that let him better fulfill his lower-level satisfaction. Because of this difference, I cannot use the Soar 9 built-in reward mechanism. I believe that my approach better accounts for natural cognitive agents way of doing.
In this video, we can see that Ernest 5.0 finds a solution based on secondary schema S52 that allows him to keep moving without bumping into walls. When observing his activity, we can infer, if we want, that he does not "like" bumping into walls but that he "likes" moving. Some may even detect some "impatience" of discovering a wider universe!
References
Laird John E., Congdon Claire B., (2008). Part VII-Reinforcement Learning. The Soar User’s Manual Version 9.0. University of Michigan.
Olivier Georgeon's research blog—also known as the story of little Ernest, the developmental agent. Keywords: situated cognition, constructivist learning, intrinsic motivation, bottom-up self-programming, individuation, theory of enaction, developmental learning, artificial sense-making, biologically inspired cognitive architectures, agnostic agents (without ontological assumptions about the environment).
Tuesday, February 24, 2009
Saturday, February 21, 2009
About local optimum
It must be noticed that sometimes, Ernest 4.3 can get stuck in some non-optimal solution. In this video, he finds a solution made of primary schema S14 and secondary schema S30. This solution gives him a Yahoo! when enacting each of these two schemas. This solution, however, is not optimal because it makes Ernest systematically bump each secondary schema S30 first step, which could be avoided in this environment.
This problem relates to the so-called "local optimum" problem: Ernest will not explore other solutions when he has found a satisfying one.
Many authors suggest a stochastic response to this problem, consisting of putting some randomness or "noise" in the agent's behavior. Soar even implements this response by default when reinforcement learning is activated, through the so-called "epsilon-greedy" exploration policy (Laird & Congdon, 2008). This randomness will sometimes cause the agent to perform an action that he does not prefer, in order to make him explore other solutions.
I do not agree with this response because I don't see any sense having the agent not choose his preferred action. That will only impede the learning process. Besides, it has been widely accepted, since Simon (1955), that cognitive agents' goal is not to find the optimum solution but only a satisfying solution.
For Ernest, I would rather implement some mechanism that would make him choose to explore other solutions when he gets "bored". Before that, I could at least reduce this non-optimal-solution risk by computing a better payoff value for each schema.
References
Laird John E., Congdon Claire B., (2008). The Soar User’s Manual Version 9.0. University of Michigan.
Simon, H. (1955). A behavioral model of rational choice. Quaterly Journal of Economics, 69, 99-118.
This problem relates to the so-called "local optimum" problem: Ernest will not explore other solutions when he has found a satisfying one.
Many authors suggest a stochastic response to this problem, consisting of putting some randomness or "noise" in the agent's behavior. Soar even implements this response by default when reinforcement learning is activated, through the so-called "epsilon-greedy" exploration policy (Laird & Congdon, 2008). This randomness will sometimes cause the agent to perform an action that he does not prefer, in order to make him explore other solutions.
I do not agree with this response because I don't see any sense having the agent not choose his preferred action. That will only impede the learning process. Besides, it has been widely accepted, since Simon (1955), that cognitive agents' goal is not to find the optimum solution but only a satisfying solution.
For Ernest, I would rather implement some mechanism that would make him choose to explore other solutions when he gets "bored". Before that, I could at least reduce this non-optimal-solution risk by computing a better payoff value for each schema.
References
Laird John E., Congdon Claire B., (2008). The Soar User’s Manual Version 9.0. University of Michigan.
Simon, H. (1955). A behavioral model of rational choice. Quaterly Journal of Economics, 69, 99-118.
Friday, February 20, 2009
Ernest 4.3
Ernest 4.3 is the same as Ernest 4.2, except that he can do three different things: go ahead, rotate right, and rotate left. This allows him to explore his new 2-dimension environment. Like Ernest 4.2, he can only perceive two things: bump and non-bump.
See how fast he learns a smart strategy to avoid bumping into walls: keeping spinning on himself. Like before, he is able to learn second order schemas, but in this environment, there is no need to enact them. In this example, keeping doing primary schema S35 gives him a Yahoo! each time.
Interestingly, the number of schemas he constructs per cycle does not grow with the environment complexity. It actually grows with Ernest's own complexity, proportionally to the number of elementary actions he is able to perform multiplied by the number of elementary sensations he is able to perceive. Hence, the complexity remains under control, and there will be no combinatorial explosion when the environment complexity increases.
The time needed to explore the environment would however grow with the environment complexity. This raises the interesting question of Ernest's “education”, that is, designing “pedagogical” situations where he could more easily learn lower-level schemas, on which higher-level schemas could anchor.
See how fast he learns a smart strategy to avoid bumping into walls: keeping spinning on himself. Like before, he is able to learn second order schemas, but in this environment, there is no need to enact them. In this example, keeping doing primary schema S35 gives him a Yahoo! each time.
Interestingly, the number of schemas he constructs per cycle does not grow with the environment complexity. It actually grows with Ernest's own complexity, proportionally to the number of elementary actions he is able to perform multiplied by the number of elementary sensations he is able to perceive. Hence, the complexity remains under control, and there will be no combinatorial explosion when the environment complexity increases.
The time needed to explore the environment would however grow with the environment complexity. This raises the interesting question of Ernest's “education”, that is, designing “pedagogical” situations where he could more easily learn lower-level schemas, on which higher-level schemas could anchor.
Wednesday, February 4, 2009
Reference
A paper presenting this work has just been accepted at the BRIMS conference. The reference is:
Georgeon O. L., Ritter F. E., Haynes S. R. (2009). Modeling Bottom-Up Learning from Activity in Soar. Proceedings of the 18th Annual Conference on Behavior Representation in Modeling and Simulation (BRIMS). Sundance, Utah. March 30 – April 2, 2009. 09-BRIMS-016, pp. 65-72.
Georgeon O. L., Ritter F. E., Haynes S. R. (2009). Modeling Bottom-Up Learning from Activity in Soar. Proceedings of the 18th Annual Conference on Behavior Representation in Modeling and Simulation (BRIMS). Sundance, Utah. March 30 – April 2, 2009. 09-BRIMS-016, pp. 65-72.
Subscribe to:
Posts (Atom)