Olivier Georgeon's research blog—also known as the story of little Ernest, the developmental agent.

Keywords: situated cognition, constructivist learning, intrinsic motivation, bottom-up self-programming, individuation, theory of enaction, developmental learning, artificial sense-making, biologically inspired cognitive architectures, agnostic agents (without ontological assumptions about the environment).

Friday, October 31, 2008

Ernest's model

This picture shows how Ernest is modeled in Soar's "working memory" (shift-click to enlarge in a new window).

For example, a schema printed in the trace as (AX B Y) (2) is stored as a subgraphe <sch> of the schema memory node <scm>. AX is the context (<sch>.<con>) made up of the previous action A (<con>.<set>) and the previous response X (<con>.<get>). B is the action proposed by the schema (<sch>.<set>). Y is the schema expectation (<sch>.<get>), (2) is the schema weight (<sch>.<wei>).

So far, the context only covers one round. It depends on the short-term memory span, because it is temporarily stored in short-term memory before the next response from the environment is known, and the schema can be memorized. Nevertheless, the short-term memory, and thus the context, can be enlarged.

Ernest's viewpoint on his activity

Learning second-order schemas from activity is is not so easy. To get insights, I have slightly modified Ernest and displayed his activity trace under a new form.

Newly released Ernest 3.1 implements the same strategy as Ernest 3.0 in the same "aA..bB..aA..aA" environment. His trace, however, does not print his primary actions but his primary schemas. So, instead of just selecting an action from previous experience, Ernest 3.1 constructs a full proposed schema. The trace prints this proposed schema (in orange) when it is enacted. The weights that led Ernest to choose this schema are printed in parenthesis in the same line. In addition, instead of printing the environment's response (Y or X), this trace prints the schemas that Ernest learns (in blue) when the interaction cycle is completed.

When weights for doing a A- or a B-schema are equal, Ernest chooses randomly and expresses his hesitation by "Hum?". If a randomly-chosen schema succeeds, Ernest expresses his satisfaction by "Yee!"; if it fails, Ernest expresses his deception by "Boo.". In both cases, the actual schema is learned or reinforced.

When weights for doing a A- or a B-schema are different, then Ernest proposes a schema based on the highest one. In that case, he is confident of getting Y, and he expresses this confidence by "Hop!". If the schema succeeds he says "A-a", but if the schema fails, he expresses his surprise by "O-o".

This new trace layout is interesting because we can understand it as the Ernest's viewpoint on his activity. His activity appears to him as a stream of "affordances", i.e., a stream of contexts that trigger schemas.

Moreover, the schemas that Ernest learns from his activity are shown in the trace (in blue). These schemas constitute knowledge that Ernest can use to fulfill his goal of getting Ys. Hence, we can consider that Ernest "understands" these schemas, if we agree with a pragmatic epistemology stating that "meaning just is use" (Wittgenstein, 1953). From this first layer of knowledge, Ernest should construct more abstract knowledge that stays meaningful to him because this knowldege stays grounded on Ernest's activity.

Reference

Wittgenstein, Ludwig (1953/2001). Philosophical Investigations.

Saturday, October 25, 2008

Poor Ernest 3.0

So far, Ernest has had environments with no own state, thus the right action for Ernest to choose did not depend on previous actions. But what if Ernest has to do a previous action in order to get a situation where he has the possibility to get a Y?

In this new environment, called "aA..bB..aA..aA", Ernest has to do two consecutive A or two consecutive B to get a Y. Further same consecutive actions will not lead to Y anymore. So, Ernest will only get a Y if he does A when he has previously done B then A, or if he does B when he has previously done A then B.

This trace shows Ernest 3.0 in this new environment. For better understanding, new labels have been used: "Learn new Schema" means Ernest memorizes a new schema (previously called "Mermorize schema"). "Recall Schema" means Ernest recalls a previously-learned schema that match the current context (previously called "Propose Schema" or "Avoid Schema"). Recalled schema having a X expectation have their weight displayed with the "-" sign (this was previously indicated by the "Avoid" term). This experiment begins with a "AA" initial environment's state. That causes Ernest to begin by learning a AXAX schema, which is OK.

Now, the environment does not always expects a specific action to return Y, there are some situations where any action from Ernest will lead to a X. Instead, the environment has a State made up of the two last Ernest's actions. Ernest will only get Y if he does A when the state is BA or if he does B when the state is AB.

As we can see, poor Ernest has to struggle hard to get Ys. He indefinitely gets one every once in a while, but he is unable to find a simple regularity aAbBaAbB that would give him a Y every second action.

Generally speaking, Enest is lost when the environment's regularities have a longer span than his context memory. I could easily make Ernest able to deal with this specific environment by increasing his context memory span, but I am looking for a more general solution. The basic idea is that Ernest should be able to construct "schemas of schemas".

Tuesday, October 21, 2008

Remarks on Ernest's Soar implementation

There are several remarks worth noticing about Ernest's implementation in Soar:

- I do not use Soar's input and output functions. My Soar model implements both Ernest and his environment. From the Soar's viewpoint, this model does not interact with any outside environment, it evolves by itself. It is only us, as observers, who understand it as an agent interacting with an environment.

- Ernest's memory does not match the classical Soar memory definition. From the Ernest's design viewpoint, Ernest stores schemas in his long-term memory, and the current situation in his short-term memory. From the Soar viewpoint, however, these schemas and this situation are actually stored in what the Soar vocabulary calls working memory, usually considered as declarative and semantic. Thus, Soar modelers could think that Ernest learns semantic knowledge, but that would be cheeky because, from Ernest's viewpoint, this knowledge has no semantics, it is only behavioral patterns and thus should be called procedural knowledge.

- I do not describe Ernest's action possibilities as operators, contrary to what Soar expects. Instead, I describe them as schemas, and my model only uses one Soar operator to generate the appropriate action that results from the evaluation of schemas.

- I cannot use Soar's built-in reinforcement learning mechanism for two reasons. The first is that it only applies to operators, and I do not need to reinforce operators but schemas. The second is that Soar reinforcement learning is designed to let the modeler define rewards from the environment. From these rewards, Soar computes operator preferences through an algorithm on which I have insufficient control. In my case, Ernest's behavior is not driven by rewards sent to him as inputs, but by internal preferences for certain types of schemas. Therefore, Soar reward mechanism does not help me, and I have to implement my own reinforcement mechanism that just increases a schema's weight each time it is enacted.

- So far, I do not use the Soar impasse mechanism. When Ernest does not have knowledge to choose between two or more schemas, he just randomly picks one.

- I do not use the Soar's default probabilistic action selection mechanism. The idea that there should be an epsilon probability that Ernest choose not his preferred action is just absurd. It only impedes the exploration and learning process. I force the epsilon value to zero.

In conclusion, it is clear that my usage of Soar does not correspond to what Soar has been created for. Soar has been created for representing the modeler's knowledge but not for developing agents who construct their own knowledge from their activity. Nevertheless, so far, Soar has proven to offer enough flexibility to be usable for my approach. It provides me with powerful and efficient graph manipulation facilities that are essential for Ernest.

Wednesday, October 15, 2008

Soar Ernest 3.0

This new environment expects alternatively A and B during the 10 first rounds, then it toggles to always B. This toggle happens roughly at the middle of the video.

Like Ernest 2.0, Ernest 3.0 has an innate preference for schemas that have an expectation of Y, but in addition, he has en innate dislike for schemas that have an expectation of X. That is, if Ernest has a schema that matches the current context and has an expectation of X, then he will avoid redo the same action.

As we can see in the trace, compared to Ernest 2.0, this conjunction of "like" and "dislike" drastically speeds up the learning process. In this example, after the fourth round, Ernest always manages to get a Y until the environment toggles.

In this trace, schemas are represented as a quintuple (ca, ce, a, e, w). They are the same as Ernest2.0's, but in addition, they are weighted: ca is the context of the previous action, ce is the context of the previous response from the environment, a is the action and e si the expectation of the schema. w is the schema's weight, that is, the number of time the schema was reinforced.

Like Ernest 2.0, after receiving the response from the environment, Ernest 3.0 memorizes the schema, if it is not already known. In addition, if the schema is already known, Ernest reinforces it by incrementing its weight.

For choosing an action, Ernest recalls all the schemas that match the current context, he compares the sums of their weights for each action, counting negatively the weights of schemas having an expectation of X, then he chooses the action with the highest sum. If they are equal, he chooses randomly. For example, in the last decision of this trace, the context is BY and there are three matching schemas: (BY B X 1) (BY A Y 3) and (BY A X 5). That means, in this context, there was one bad experience of choosing B, three good experiences of choosing A, and five bad experiences of choosing A. Thus, Ernest chooses to do B because w(B) = -1 > w(A) = 3-5 = -2.

At the middle of the video, when the environment toggles, the previously learned schemas do not meet their expectations anymore, and they get a X response instead of a Y. That results into the learning of new schemas with an expectation of X. When these new "X" schemas reach a higher weight than the "Y" ones, then the wrong action is not chosen anymore. That means, Ernest becomes more "afraid" of getting X than "confident" of getting Y, if he does A. Thus, Ernest starts sticking to B and gets Ys again.

Ernest can now adapt to two different environments at least, and can readapt if the environment changes from one of them to the other. He has two adaptation mechanisms: the first is based on the learning of behavioral patterns adapted to the short-term context, and the second on a long-term reinforcement of these behavioral patterns.

Thursday, October 9, 2008

Poor Ernest 2.0

So, Ernest 2.0 can adapt to two different environment: the so called ABAB and BBBB environments. But what if Ernest is put into a ABAB environment that turns into a BBBB environment after a while? Let's call this third environment the AB--ABBB--BB environment.

Again, this is catastrophic. The schemas that were learned in the ABAB situation are no longer working in the BBBB situation. Ernest learns new schemas corresponding to the BBBB situation, but these new schemas are contradictory to the previous ones. As poor Ernest is unable to chose between concurrent schemas, he is irremediably lost when the environment turns to BBBB.

What Ernest needs is a way to loose confidence in schemas that do not meet expectations, and to reinforce confidence in schemas that meet expectations. More generally, Ernest needs a way to attribute a confidence value to his schemas, because in complex environments, he cannot assume them to work all the time.

Fortunately, the newly-released Soar version 9 offers reinforcement learning facilities. So far, Ernest was designed with Herbal and implemented with the Jess rule engine. We now need to implement Ernest with Soar to take advantage of reinforcement learning.

Wednesday, October 1, 2008

Ernest 2.0

As explained by many philosophers and well summurized by (Chaput 2004), to make sense, Ernest's knowledge has to be grounded in his acts. That is, knowldege will take the form of behavioral patterns, which are learned by Ernest during his activity. In psychology, these behavioral patterns are known as Schemas.

We will follow the proposition of Drescher (1991) to implement schemas under the form of triples: (Context, Action, Expectation). Context is a situation where Action can be performed. Expectation is the situation after Action was performed in Context for the first time. Expectation is thus the situation that is expected when the schema is applied again in a similar Context.

Notice that a situation is something sequential. To deal with his new environment, Ernest must manage two sequential situation elements: his previous action (context-a), and the previous environment's answer (context-e).

Ernest learns a new schema after each round. This new schema is added to Ernest's long term memory if it is not already there. So to do, Ernest keeps the current Context and Action in short term memory, until he gets the environment's answer. Both schemas and short term memory are displayed in green in the trace. We can see that Ernest knows more and more schemas as his activity unfolds.

As before, Ernest can do A or B. Here, his environment expects successively A then B. The environment responds Y if this expectation is met, and X if not. Ernest "loves" Y. This love is implemented as an innate preference for Y. That is, if Ernest knows a schema with a context corresponding to the current situation, and with an expectation of Y, he recalls this schema. If not, he acts randomly.

The trace shows that after several rounds, a successful schema has been learnt for any situation. Thus, Ernest becomes able to succeed every time in this "ABAB" environment. Notice that, if placed in the "BBBB" environment that always expects B, Ernest will also learn to succeed every time. Thus, Ernest can now adapt to two different environments, at least.

Reference

Drescher, G. L. (1991). Made-Up Minds, a Constructivist Approach to Artificial Intelligence. Cambridge, MA: MIT Press