This figure details how schemas are constructed in the two first cycles.
At the level 0, elementary schemas are represented as circles. They are named A (schema A1 = doing A) and B (schema B1 = doing B). They are green if they succeed and red if they fail.
First level and second level schemas are represented as arrows. Their context is on the left side (dot), and their intention is on the right side (arrow). For instance S13=(AS,BF) and S21=(S13S,AF) (S=succeed, F=fail).
At the beginning, the context is initialized to AS. In this context, at this time, Ernest has no preference, so he randomly picks B. The environment returns Fail, so Ernest reinforces S13.
At the beginning of the second cycle, the context now has two levels: BF and S13S. At this time, and in this context, Ernest has no preference so he randomly picks A. The environment returns fail, so Ernest reinforces S16. In addition, because he also had S13S as context, Ernest considers that a schema (S13S,AF) has also been enacted. This schema does not yet exist so he creates it and names it S21.
Moreover, because S16S has been enacted from an element that is still in the context (AS = context of schema S13) then Ernest considers that a schema (AS, S16S) has also been enacted. This schema does not yet exist so he creates it and names it S22.
At the end of cycle 2, Enest has thus reinforced two first-level schemas and constructed two second-level schemas. The context is now made of schemas AF, S16S, S21S and S22S.
If we continued with this principle, Enest would construct third-level schemas on the third cycle, fourth-level schemas on the fourth cycle, and so on. This means that at each new cycle, the context would be a structure representing Ernest's whole life up to this point. For scalability reason, we cannot handle such a complexity, thus, in this experiment, we forbid higher-level schema construction above level two.
These two levels of schemas are enough for Ernest to succeed in the aA..bB task because these schemas memory span is equal to this task regularity span.
In this experiment, first-level schemas and second-level schemas are only used to propose elementary schemas, but they are not themselves enacted. To go ahead, we need to implement these schemas enaction.
Olivier Georgeon's research blog—also known as the story of little Ernest, the developmental agent.
Keywords: situated cognition, constructivist learning, intrinsic motivation, bottom-up self-programming, individuation, theory of enaction, developmental learning, artificial sense-making, biologically inspired cognitive architectures, agnostic agents (without ontological assumptions about the environment).
Wednesday, May 27, 2009
Tuesday, May 19, 2009
Ernest 6.0 and the aA..bB environment
This video shows Ernest 6.0 in the aA..bB environment that I have previously explained here. In this environment, Ernest has to do two consecutive As or two consecutive Bs to succeed.
At the beginning, Ernest is initialized with two possible acts: A1 (doing A) and B1 (doing B). Their succeeding satisfaction is equal to 1 (getting Y), and their failing satisfaction is equal to -1 (getting X). In addition, Ernest is initialized with 16 primary schemas S3 to S18 made by combination of the acts A1 and B1 with their succeeding or failing status (yellow lines), for instance S3=( A1 S, A1 S, 2, 0). The satisfaction of S3 is equal to satisfaction(A1 S) + satisfaction(A1 S) = 2. The weight of S3 is 0 because it has not yet been enacted.
Ernest's short term memory is initialized with S1 S, as if schema S1 had been enacted and succeeded. At the first cycle, four schemas match this context: S3, S11, S5, S13 (four first pale green lines) but all their propositions are weighted 0, so the schema B1 is randomly picked.
So, Ernest does B but he receives a fail status from the environment.
Then, the new situation is assessed. This assessment basically consists of expressing the situation in terms of schemas. If these schemas already exist then they are reinforced, if they do not exist then they are constructed with a weight of 1. At the end of the first cycle, this assessment leads to reinforcing S13 and to setting the new context as containing B1 F and S13 S, which now forms two levels of context.
After the second cycle, we can see that two second-order schemas are constructed: S21 and S22 that are based on the two levels of the previous context B1 F and S13 S.
These second-order schemas cover two rounds, and when they will match the context, they will influence the selection in favor of primary schemas that will lead to success one round later. That is how Ernest can finally learn to succeed in this environment.
Notice that After each cycle, schemas of all levels that match the resulting situation are reinforced. This leads to a problem because higher-level schemas tend to force lower-level schemas that fail, which is good, but then these lower-level schemas will be reinforced which will tend to reject their acts in return, because they have a negative satisfaction.
To avoid this problem, I have limited the maximum weight a schema can receive to 5. When a schema has reached a weight of 5, it becomes frozen. At the end of this trace, we can see that Ernest reaches a stable activity that solves the aA..bB task, based on frozen schemas.
Basically, it means that the learning of higher-level schemas is made possible by the progressive freezing of lower-level schemas.
At the beginning, Ernest is initialized with two possible acts: A1 (doing A) and B1 (doing B). Their succeeding satisfaction is equal to 1 (getting Y), and their failing satisfaction is equal to -1 (getting X). In addition, Ernest is initialized with 16 primary schemas S3 to S18 made by combination of the acts A1 and B1 with their succeeding or failing status (yellow lines), for instance S3=( A1 S, A1 S, 2, 0). The satisfaction of S3 is equal to satisfaction(A1 S) + satisfaction(A1 S) = 2. The weight of S3 is 0 because it has not yet been enacted.
Ernest's short term memory is initialized with S1 S, as if schema S1 had been enacted and succeeded. At the first cycle, four schemas match this context: S3, S11, S5, S13 (four first pale green lines) but all their propositions are weighted 0, so the schema B1 is randomly picked.
So, Ernest does B but he receives a fail status from the environment.
Then, the new situation is assessed. This assessment basically consists of expressing the situation in terms of schemas. If these schemas already exist then they are reinforced, if they do not exist then they are constructed with a weight of 1. At the end of the first cycle, this assessment leads to reinforcing S13 and to setting the new context as containing B1 F and S13 S, which now forms two levels of context.
After the second cycle, we can see that two second-order schemas are constructed: S21 and S22 that are based on the two levels of the previous context B1 F and S13 S.
These second-order schemas cover two rounds, and when they will match the context, they will influence the selection in favor of primary schemas that will lead to success one round later. That is how Ernest can finally learn to succeed in this environment.
Notice that After each cycle, schemas of all levels that match the resulting situation are reinforced. This leads to a problem because higher-level schemas tend to force lower-level schemas that fail, which is good, but then these lower-level schemas will be reinforced which will tend to reject their acts in return, because they have a negative satisfaction.
To avoid this problem, I have limited the maximum weight a schema can receive to 5. When a schema has reached a weight of 5, it becomes frozen. At the end of this trace, we can see that Ernest reaches a stable activity that solves the aA..bB task, based on frozen schemas.
Basically, it means that the learning of higher-level schemas is made possible by the progressive freezing of lower-level schemas.
Enrest 6.0
To prepare Ernest to recursively learn schemas on top of one another, I have almost entirely rewritten it.
Now, schemas are no longer triples of subschemas but couples of subschemas. The first subschema of a schema is its context, and the second is its intention. For instance, S3=(S1,S2) means that the schema S3 intents to enact S2 in a context where S1 has been enacted. In addition, subschemas are associated with their succeed or failure status: S=Succeed or F=Fail. So, for instance, S3=(S1 S, S2 S) means that S3 expects S2 to succeed in a context where S1 has succeeded. On contrary, S4=(S1 S, S2 F) would expect S2 to fail in a context where S1 has succeeded.
Like before, schemas also have satisfaction values and weights. So, S3=(S1 S, S2 S, 2, 1) means that schema S3 has a satisfaction of 2 and a weight of 1. The satisfaction of a schema is the sum of the satisfactions of his subschemas for their specific status. For instance S1 and S2 may both have a satisfaction of 1 when they succeed, so S3 has a satisfaction of 1+1 = 2. If S2 has a satisfaction of -1 when it fails, then S4 would have a satisfactin of 1-1 = 0. The weight of a schema is the number of times the schema has been enacted. For instance, if S2 has failed 3 times in a context where S1 has succeed, then we have S4=(S1 S, S2 F, 0, 3)
At each cycle, all the schemas whose context match the current context propose their intention. The proposition weight is equal to the intended subschema satisfaction multiplied by the proposing schema weight. This can be understood as the benefit of doing it multiplied by the confidence to succeed. For instance, in a context where S1 has succeeded, S3 proposes S2 with a proposition weight equal to satisfaction(S2 S)*weight(S3) = 1*1 = 1. In the same context, S4 proposes S2 with a proposition weight equal to satisfaction(S2 F)*weight(S4) = -1*3 = -3.
Then, the proposition weights of each schemas are summed up and the schema with the highest sum is selected and enacted. In our exemple, S2 has a total proposition weight equal to 1-3 = -2. Negative propositions can be understood as "fear" for getting unsatisfaction. In this case, Ernest is "afraid" of doing S2 because he had more bad experiences of doing it than good experiences in this context. He will only do it if he has no more appealing choice.
When the selected schema has been enacted, the environment returns its succeed or fail status. Based on this status, the new context is assessed and new schemas are learned or reinforced.
Now, schemas are no longer triples of subschemas but couples of subschemas. The first subschema of a schema is its context, and the second is its intention. For instance, S3=(S1,S2) means that the schema S3 intents to enact S2 in a context where S1 has been enacted. In addition, subschemas are associated with their succeed or failure status: S=Succeed or F=Fail. So, for instance, S3=(S1 S, S2 S) means that S3 expects S2 to succeed in a context where S1 has succeeded. On contrary, S4=(S1 S, S2 F) would expect S2 to fail in a context where S1 has succeeded.
Like before, schemas also have satisfaction values and weights. So, S3=(S1 S, S2 S, 2, 1) means that schema S3 has a satisfaction of 2 and a weight of 1. The satisfaction of a schema is the sum of the satisfactions of his subschemas for their specific status. For instance S1 and S2 may both have a satisfaction of 1 when they succeed, so S3 has a satisfaction of 1+1 = 2. If S2 has a satisfaction of -1 when it fails, then S4 would have a satisfactin of 1-1 = 0. The weight of a schema is the number of times the schema has been enacted. For instance, if S2 has failed 3 times in a context where S1 has succeed, then we have S4=(S1 S, S2 F, 0, 3)
At each cycle, all the schemas whose context match the current context propose their intention. The proposition weight is equal to the intended subschema satisfaction multiplied by the proposing schema weight. This can be understood as the benefit of doing it multiplied by the confidence to succeed. For instance, in a context where S1 has succeeded, S3 proposes S2 with a proposition weight equal to satisfaction(S2 S)*weight(S3) = 1*1 = 1. In the same context, S4 proposes S2 with a proposition weight equal to satisfaction(S2 F)*weight(S4) = -1*3 = -3.
Then, the proposition weights of each schemas are summed up and the schema with the highest sum is selected and enacted. In our exemple, S2 has a total proposition weight equal to 1-3 = -2. Negative propositions can be understood as "fear" for getting unsatisfaction. In this case, Ernest is "afraid" of doing S2 because he had more bad experiences of doing it than good experiences in this context. He will only do it if he has no more appealing choice.
When the selected schema has been enacted, the environment returns its succeed or fail status. Based on this status, the new context is assessed and new schemas are learned or reinforced.
Subscribe to:
Posts (Atom)