Olivier Georgeon's research blog—also known as the story of little Ernest, the developmental agent.

Keywords: situated cognition, constructivist learning, intrinsic motivation, bottom-up self-programming, individuation, theory of enaction, developmental learning, artificial sense-making, biologically inspired cognitive architectures, agnostic agents (without ontological assumptions about the environment).

Tuesday, May 19, 2009

Ernest 6.0 and the aA..bB environment

This video shows Ernest 6.0 in the aA..bB environment that I have previously explained here. In this environment, Ernest has to do two consecutive As or two consecutive Bs to succeed.

At the beginning, Ernest is initialized with two possible acts: A1 (doing A) and B1 (doing B). Their succeeding satisfaction is equal to 1 (getting Y), and their failing satisfaction is equal to -1 (getting X). In addition, Ernest is initialized with 16 primary schemas S3 to S18 made by combination of the acts A1 and B1 with their succeeding or failing status (yellow lines), for instance S3=( A1 S, A1 S, 2, 0). The satisfaction of S3 is equal to satisfaction(A1 S) + satisfaction(A1 S) = 2. The weight of S3 is 0 because it has not yet been enacted.

Ernest's short term memory is initialized with S1 S, as if schema S1 had been enacted and succeeded. At the first cycle, four schemas match this context: S3, S11, S5, S13 (four first pale green lines) but all their propositions are weighted 0, so the schema B1 is randomly picked.

So, Ernest does B but he receives a fail status from the environment.

Then, the new situation is assessed. This assessment basically consists of expressing the situation in terms of schemas. If these schemas already exist then they are reinforced, if they do not exist then they are constructed with a weight of 1. At the end of the first cycle, this assessment leads to reinforcing S13 and to setting the new context as containing B1 F and S13 S, which now forms two levels of context.

After the second cycle, we can see that two second-order schemas are constructed: S21 and S22 that are based on the two levels of the previous context B1 F and S13 S.

These second-order schemas cover two rounds, and when they will match the context, they will influence the selection in favor of primary schemas that will lead to success one round later. That is how Ernest can finally learn to succeed in this environment.

Notice that After each cycle, schemas of all levels that match the resulting situation are reinforced. This leads to a problem because higher-level schemas tend to force lower-level schemas that fail, which is good, but then these lower-level schemas will be reinforced which will tend to reject their acts in return, because they have a negative satisfaction.

To avoid this problem, I have limited the maximum weight a schema can receive to 5. When a schema has reached a weight of 5, it becomes frozen. At the end of this trace, we can see that Ernest reaches a stable activity that solves the aA..bB task, based on frozen schemas.

Basically, it means that the learning of higher-level schemas is made possible by the progressive freezing of lower-level schemas.

No comments: