Olivier Georgeon's research blog—also known as the story of little Ernest, the developmental agent.

Keywords: situated cognition, constructivist learning, intrinsic motivation, bottom-up self-programming, individuation, theory of enaction, developmental learning, artificial sense-making, biologically inspired cognitive architectures, agnostic agents (without ontological assumptions about the environment).

Sunday, December 7, 2008

Review of Sun R. (2004)

Sun, Ron (2004). Desiderata for cognitive architectures. Philosophical Psychology, Vol.17, No.3, pp.341-373.

I wrote this review because I think this paper gives valuable arguments in favor of the approach I have chosen with Ernest.

It starts with a nice presentation of what cognitive architectures are: "a concrete framework for detailed modeling of cognitive phenomena, based on a specification of essential structures". This presentation, however, points out that there is no clear consensus about what these essential structures should be. This lack of consensus has given raise to many different cognitive architectures which are difficult to assess and compare. The paper reviews ACT-R, Soar, EPIC, COGNET, CLARION.

Ron Sun discusses this lack of consensus with many regard. For example, many dichotomies have been proposed with regard to memory: short-term memory/working memory/long-term memory, implicit memory/explicit memory, procedural memory/ declarative memory, etc. Ron Sun concludes that “it is far from clear what essential subsystems of memory are, and thus, how memory should be divided”. Moreover, it is not even sure that memory itself can be seen as a differentiated module from the rest of cognition: “Memory is not simply retention, but it is also pretentions, as pointed out by Husserl (1970); that is, it actively participates in intercepting the on-going flow of sensory information and it itself changes and organizes in the process”.

In this situation, Ron Sun advocates a minimalist approach of cognitive architecture, and enounces a list of basic desiderata:
  • Ecological realism: account for essential cognitive functions in a natural environment.
  • Bio-evolutionary realism: human models should be reducible to animal models, because of a continuum between animals and humans.
  • Cognitive realism: capture essential characteristics, abstract away from details of voluminous data.
  • Reactivity: account for fast response which does not involve elaborated computation.
  • Sequentiality: account for ability to recognize temporal dependencies.
  • Routiness: “Overall, we may view human everyday activities as consisting of forming, changing, and following routines”. “The initiation of routines (e.g. setting goals), the routines themselves, and the termination of routines can all be learned, in addition to being pre-wired using pre-determined rules. Routines (and their initiation and termination) may be learned through experience, including autonomous exploration, instructions, imitations, extraction, and other means”.
  • Trial and error adaptation: “learning of reactive routines is mostly, and essentially, a trial-and-error adaptation process. There are reasons to believe that this kind of learning is the most essential to human everyday activities and cognition”.
  • Bottom-up learning: “concerns how complex reasoning can arise from the simplest adaptive behavior, how abstract concepts can be based on simple, concrete reactive actions patterns, how consciousness can emerge from unconsciousness, and so on”. Moreover, “Buttom-up learning enables conceptual structures of an agent to be grounded in both the subsymbolic processes of the agent as well as the interactions between the agent and the world”.
  • Modularity: Ron Sun proposes that the two basic modules of a cognitive architecture should be an implicit module and an explicit module. According to him, such an architecture would meet this whole desiderata list. Among the architectures he reviews, only CLARION (his own architecture) implements this modularity.

Ron Sun grounds his argumentation on phenomenological philosophy. He refers to Heidegger (1927) for the idea that behavior is prior to representation: “Comportment, according to Heidegger, "makes possible every intentional relation to beings" and "precedes every possible mode of activity in general", prior to explicit beliefs, prior to explicit knowledge, prior to explicit conceptual thinking, and even prior to explicit desire. Comportment is thus primary, in exactly this sense. The traditional mistake of representationalism lied in the fact that they treat explicit knowledge and its correlates as the most basic instead, and thus they turn the priority upside-down: and in so doing, "every act of directing oneself toward something receives [wrongly] the characteristics of knowing".

Ron Sun acknowledges that “Conceptual thinking has important roles to play too in cognition”, but he states that “Conceptual thinking is "derived" from low-level mechanisms, because it is secondary in several (radically different but related) senses: evolutionarily, phylogenetically, ontogenetically (developmentally) and ontologically.”

Thus, Ron Sun insists on the need to focus more on the interaction between a cognitive agent and his world, and amongst cognitive agents. “What we need to do to gain a better understanding of comportment [...] is to look into the development of comportment […]. In particular, we should examine its development in the ontogenesis of individual agents, which is the most important means by which an agent develops its subconceptual behavioral routines or comportments, although some of the structures (such as modularity) might be formed evolutionarily, a priori, as discussed before.”

I believe my work follows the directions suggested in this paper because I am focusing on the interaction between an agent and his environment, and implementing buttom-up learning mechanisms from this interaction. My only regret about this paper is that I don’t think it is totally fair with other cognitive architectures than CLARION. I think the dichotomy between implicit and explicit depends on the viewpoint from which we look at it, and thus, I am not sure this dichotomy must be implemented in the cognitive architecture itself. Soar is generally considered as an explicit representationalist approach of cognition, but I can use it to model implicitly-controled behavior. I think this representationalist assumption, on which Soar is based, might even facilitate the implementation of the bottom-up learning of explicit knowledge from implicit know-how.

References

Hidegger, M. (1927). Being and Time. English translation published by Harper and Row, New York. 1962.

Monday, November 24, 2008

Ernest 4.1's abstraction

This figure illustrates how Ernest learns knowledge from his activity, and in parallel, how this knowledge helps him better control his activity. It shows the same trace as in Enest 4.1 video.

The raw activity is represented at the bottom. It is an alternance of A or B done by Ernest and of X or Y returned by the environment.

The ascendant blue arrows represent the construction of more abstract items.

The first abstraction level is called "Acts". An act corresponds to a cycle Ernest-Environment. A new type of act is constructed when a new combination Ernest-Environment is encountered.

The second abstraction level is "primary schemas". A primary schema is made up of two acts: the context and the action. The action act can be seen as a raw action associated with a raw expectation. The context act triggers the schema, and the schema tries to control the action act, but sometimes it fails. It is only at the end of the action act that the actually enacted primary schema is completely known. Hence, there is a tightly coupling between these two levels which is represented by dash gray double arrows: Trigger/Control.

The third abstraction level is "secondary schemas". Secondary schemas are made up of three primary schemas: context, action and expectation. The context primary schema triggers the secondary schema, and the secondary schema tries to control the action primary schema. In this environment, secondary schemas always succeed, so Ernest becomes "in control" of his activity when secondary schemas start to be enacted.

Sunday, November 23, 2008

Ernest4.1's viewpoint

The cyclic representation gives a good idea of Ernest's implementation, but a poor idea of how Ernest learns and controls his activity.

Moreover, the cyclic representation gives the impression that Ernest'a architecture falls into the classical "perception -> cognition -> action -> perception" cycle, which is not really the case. I don't think that the "Assess environment's response" phase can be understood as a perception phase, nor that the "Execute selected schema step" phase can be seen as an action phase. The phases between them do not correspond to classical cognitive problem solving either.

Obviously, at this point, Ernest has not yet constructed the idea that an external world exists outside himself. From Ernest's viewpoint, perception and action has not yet any sense. Thus, the only psychological attribute we can grant him is what philosophers would call a "phenomenological experience", which is a flow of phenomenons that he experiences.

I think an unfold timeline is more efficient to represent this phenomenological experience than the cyclic representation, because it better shows the interwaving of abstraction, learning and control.

Ernest4.1's cycle

Construct context: consists of structuring the current context to prepare the schema construction and the schema selection. The context is made up of the three previously enacted schemas, stored in short-term memory. These schemas can be of any level, and can refer to subschemas. This phase indexes these different levels.

Construct new schemas: This phase consists of creating new schemas that match the current context. If they not yet exist, these new schemas are added to long-term memory. They constitute hypotheses about how to deal with a new context, but they still need to be tested.

Select a schema / First step: consists of selecting a schema to be executed in this context. High-level schemas add weight to their subschemas. Weights are positive if they lead to Y and negative if they lead to X. Schemas of any level compete, and the one with the highest weight is selected. If there are several ex aequo, one of them is randomly picked. This phase initialize the selected schema at its first step.

Execute schema step: sends the selected action defined in the current schema step to the environment: A or B.

Environment: computes the response from the environment and send it to Ernest: Y or X. The environment has his own memory and cycle.

Assess environment's response: checks if the schema has succeeded or failed. If the current subschema has succeeded but it is not the last step of the selected schema, then the "ongoing" loop is selected.

Next step / subschema: selects the next step in the subschema hierarchy of the selected schema.

Memorize / reinforce schema: when the schema ends up, if it has succeeded, then it is referred as the last enacted schema in short term memory. The previous two are shifted, and the previous third is drop out of short-term memory.
If the schema has failed at some point, then the actually enacted schema is stored in short-term memory and reinforced in long-term memory. For exemple, if a primary schema expecting Y has been selected, but if the environment actually returned X, then the same schema but with an expectation of X is actually memorized and reinforced.

Trace: is just used to generate the trace of this cycle and to clear the temporary variables.

Monday, November 17, 2008

Ernest 4.1

Like Ernest 4.0, Ernest 4.1 learns and exploits second-order schemas to succeed in the "aA..bB..aA..aA" environment.

The way schemas are constructed and enacted has been however slightly modified.

At the beginning of each round, Ernest 4.1 evaluates the context, and if the context is new, he constructs potentially interesting schemas that match this context. For exemple, the intial context is made up of "nils" because Ernest has no initial experience, but this context is structured under the form of a previous act A2 and a previous schema S5 (first line). An act is a couple (Ernest's action, Environment's response). From this context, four initial primary schemas are constructed: S6, S7, S8, S9. Finally, one schema is enacted: in this example S9. Like, Ernest 4.0, if the schema gets X, Ernest 4.1 sais O-o, and if it gets Y, he sais Yee!

The context is the content of Ernest's short-term memory, it corresponds to the three previously-enacted schemas. For example, in this trace, the third context contains S5, S9 and S12. These schemas can be of any level of abstraction. For convenience, the third schema is expanded: S12=(A3,A4). A3 is S12's context and A4 is S12's act. A4 is expanded: A4=(B,Y), meaning that the last thing Ernest did was B and he got Y from the environment.

Like Ernest 4.0, Ernest 4.1 constructs a secondary schema each time a primary schema succeeds. This secondary schema is made up of the three previously-enacted primary schemas. That also corresponds to the context stored in short-term memory.

Later, when a new context matches a secondary schema, this secondary schema is enacted, if he has the highest weight amongst all the schemas matching this context, at any level.

Contrary to Ernest 4.0, Ernest 4.1 does not trace the details of the secondary schema enaction. A secondary schema enaction is displayed as a single line in the trace, despite it actually takes two rounds. Moreover, only the secondary schema is reinforced and stored in short-term memory, but not the primary schemas that are part of it. Thus, when a secondary schema is enacted, short-term memory is not filled with lower-level details. This mechanism allows Ernest to construct tertiary schemas.

At the end of this trace, we can see the construction of tertiary schemas, made up of the three previously-enacted primary or secondary schemas. In this environment, however, these tertiary shemas are not used because there is no reason for them to receive more reinforcement than the secondary schemas, and anyway, Ernest 4.1 is not yet able to recursively manage nested schemas.

It is interesting to notice that, as this trace only displays the highest-level schema enaction, it can again be understood as a description of Ernest's viewpoint on his activity. It is as if Ernest was always focusing on the highest level of control, and automatically performing lower-level behavior without having to pay attention to it.

Thursday, November 6, 2008

Ernest 4.0

Ernest 4.0 can learn and exploit second-order schemas in a way that let him solve the "aA..bB..aA..aA" problem.

This trace is very similar to Ernest3.2's, except from that it shows the recalls of second-order schemas (orange lines). This happens when it exists a second-order schema having a context equal to the primary schema that has just been enacted.

For example, the first recalled second-order schema is S14 (a few lines after the first screen of this video). The reason why S14 is recalled is because it has a context schema S8 equal to the primary schema that was enacted just before. When recalled, S14 is being enacted, forcing its action schema S10 to be enacted, despite the fact that S10 expects X. We can think that Ernest is not so happy to get this X, but at least, it is what he expected. He expresses his "resignation" by "Arf" (in grey). Then, at the next round, Ernest can enact S12 as expected by S14, and he gets Y.

We can see that Ernest finally finds the regularity "aAbBaAbB" that gets him a Y every second round, which is the best he can get in this environment.

Ernest can now find regularities that are twice as long as his short-term memory. This is possible because he aggregates sub-regularities into primary schemas that can be referenced as single items in short-term memory. These items are affordance representations that Ernest can manipulate in short-term memory. In that sense, these representations constitute a first level of abstraction from Ernest's viewpoint.

Monday, November 3, 2008

Ernest 3.2

Ernest 3.2 can now learn second-order schemas, i.e. schemas of primary schemas. However, he does not yet know how to use them.

In this trace, each line, except the yellow ones, represent a cycle Ernest-Environment. We can understand these lines as "affordances", that is, situations that give rise to some behavior. The weights that triggered this behavior are displayed in orange, the resulting primary schemas are in blue. Each of these affordances has an assessment from Ernest's viewpoint: "Yee!", "Boo.", "A-a!" or "O-o", as explained in previous post.

Like primary schemas, second-order schemas are triples: (context, action, expectation), but now, each of these three elements are affordances. When an affordance is assessed "Yee!" or "A-a", it triggers the learning of a second-order schema (in yellow), made up of the two previously-enacted affordances, appended to the triggering one.

For example, at the beginning of this trace, the second-order schema S12, is made up of schemas S6, S8, and S10 : S6 is S12's context affordance, S8 is his current affordance, and S10 is his expected affordance. That means that when Ernest encounters a situation where S6 has been enacted, he should enact S8 because that should bring him to a situation where S10 can be enacted, and S10 will bring one of those delicious Ys!

Implementing the exploitation of these second-order schemas, however, still raises many questions. How primary schemas and second-order schemas should compete to trigger behavior? How reinforcement should be distributed between these two levels? What when a second-order schema fails? In addition, second-order schemas should also constitute more abstract affordances that could be taken as higher-level schema elements.

Friday, October 31, 2008

Ernest's model

This picture shows how Ernest is modeled in Soar's "working memory" (shift-click to enlarge in a new window).

For example, a schema printed in the trace as (AX B Y) (2) is stored as a subgraphe <sch> of the schema memory node <scm>. AX is the context (<sch>.<con>) made up of the previous action A (<con>.<set>) and the previous response X (<con>.<get>). B is the action proposed by the schema (<sch>.<set>). Y is the schema expectation (<sch>.<get>), (2) is the schema weight (<sch>.<wei>).

So far, the context only covers one round. It depends on the short-term memory span, because it is temporarily stored in short-term memory before the next response from the environment is known, and the schema can be memorized. Nevertheless, the short-term memory, and thus the context, can be enlarged.

Ernest's viewpoint on his activity

Learning second-order schemas from activity is is not so easy. To get insights, I have slightly modified Ernest and displayed his activity trace under a new form.

Newly released Ernest 3.1 implements the same strategy as Ernest 3.0 in the same "aA..bB..aA..aA" environment. His trace, however, does not print his primary actions but his primary schemas. So, instead of just selecting an action from previous experience, Ernest 3.1 constructs a full proposed schema. The trace prints this proposed schema (in orange) when it is enacted. The weights that led Ernest to choose this schema are printed in parenthesis in the same line. In addition, instead of printing the environment's response (Y or X), this trace prints the schemas that Ernest learns (in blue) when the interaction cycle is completed.

When weights for doing a A- or a B-schema are equal, Ernest chooses randomly and expresses his hesitation by "Hum?". If a randomly-chosen schema succeeds, Ernest expresses his satisfaction by "Yee!"; if it fails, Ernest expresses his deception by "Boo.". In both cases, the actual schema is learned or reinforced.

When weights for doing a A- or a B-schema are different, then Ernest proposes a schema based on the highest one. In that case, he is confident of getting Y, and he expresses this confidence by "Hop!". If the schema succeeds he says "A-a", but if the schema fails, he expresses his surprise by "O-o".

This new trace layout is interesting because we can understand it as the Ernest's viewpoint on his activity. His activity appears to him as a stream of "affordances", i.e., a stream of contexts that trigger schemas.

Moreover, the schemas that Ernest learns from his activity are shown in the trace (in blue). These schemas constitute knowledge that Ernest can use to fulfill his goal of getting Ys. Hence, we can consider that Ernest "understands" these schemas, if we agree with a pragmatic epistemology stating that "meaning just is use" (Wittgenstein, 1953). From this first layer of knowledge, Ernest should construct more abstract knowledge that stays meaningful to him because this knowldege stays grounded on Ernest's activity.

Reference

Wittgenstein, Ludwig (1953/2001). Philosophical Investigations.

Saturday, October 25, 2008

Poor Ernest 3.0

So far, Ernest has had environments with no own state, thus the right action for Ernest to choose did not depend on previous actions. But what if Ernest has to do a previous action in order to get a situation where he has the possibility to get a Y?

In this new environment, called "aA..bB..aA..aA", Ernest has to do two consecutive A or two consecutive B to get a Y. Further same consecutive actions will not lead to Y anymore. So, Ernest will only get a Y if he does A when he has previously done B then A, or if he does B when he has previously done A then B.

This trace shows Ernest 3.0 in this new environment. For better understanding, new labels have been used: "Learn new Schema" means Ernest memorizes a new schema (previously called "Mermorize schema"). "Recall Schema" means Ernest recalls a previously-learned schema that match the current context (previously called "Propose Schema" or "Avoid Schema"). Recalled schema having a X expectation have their weight displayed with the "-" sign (this was previously indicated by the "Avoid" term). This experiment begins with a "AA" initial environment's state. That causes Ernest to begin by learning a AXAX schema, which is OK.

Now, the environment does not always expects a specific action to return Y, there are some situations where any action from Ernest will lead to a X. Instead, the environment has a State made up of the two last Ernest's actions. Ernest will only get Y if he does A when the state is BA or if he does B when the state is AB.

As we can see, poor Ernest has to struggle hard to get Ys. He indefinitely gets one every once in a while, but he is unable to find a simple regularity aAbBaAbB that would give him a Y every second action.

Generally speaking, Enest is lost when the environment's regularities have a longer span than his context memory. I could easily make Ernest able to deal with this specific environment by increasing his context memory span, but I am looking for a more general solution. The basic idea is that Ernest should be able to construct "schemas of schemas".

Tuesday, October 21, 2008

Remarks on Ernest's Soar implementation

There are several remarks worth noticing about Ernest's implementation in Soar:

- I do not use Soar's input and output functions. My Soar model implements both Ernest and his environment. From the Soar's viewpoint, this model does not interact with any outside environment, it evolves by itself. It is only us, as observers, who understand it as an agent interacting with an environment.

- Ernest's memory does not match the classical Soar memory definition. From the Ernest's design viewpoint, Ernest stores schemas in his long-term memory, and the current situation in his short-term memory. From the Soar viewpoint, however, these schemas and this situation are actually stored in what the Soar vocabulary calls working memory, usually considered as declarative and semantic. Thus, Soar modelers could think that Ernest learns semantic knowledge, but that would be cheeky because, from Ernest's viewpoint, this knowledge has no semantics, it is only behavioral patterns and thus should be called procedural knowledge.

- I do not describe Ernest's action possibilities as operators, contrary to what Soar expects. Instead, I describe them as schemas, and my model only uses one Soar operator to generate the appropriate action that results from the evaluation of schemas.

- I cannot use Soar's built-in reinforcement learning mechanism for two reasons. The first is that it only applies to operators, and I do not need to reinforce operators but schemas. The second is that Soar reinforcement learning is designed to let the modeler define rewards from the environment. From these rewards, Soar computes operator preferences through an algorithm on which I have insufficient control. In my case, Ernest's behavior is not driven by rewards sent to him as inputs, but by internal preferences for certain types of schemas. Therefore, Soar reward mechanism does not help me, and I have to implement my own reinforcement mechanism that just increases a schema's weight each time it is enacted.

- So far, I do not use the Soar impasse mechanism. When Ernest does not have knowledge to choose between two or more schemas, he just randomly picks one.

- I do not use the Soar's default probabilistic action selection mechanism. The idea that there should be an epsilon probability that Ernest choose not his preferred action is just absurd. It only impedes the exploration and learning process. I force the epsilon value to zero.

In conclusion, it is clear that my usage of Soar does not correspond to what Soar has been created for. Soar has been created for representing the modeler's knowledge but not for developing agents who construct their own knowledge from their activity. Nevertheless, so far, Soar has proven to offer enough flexibility to be usable for my approach. It provides me with powerful and efficient graph manipulation facilities that are essential for Ernest.

Wednesday, October 15, 2008

Soar Ernest 3.0

This new environment expects alternatively A and B during the 10 first rounds, then it toggles to always B. This toggle happens roughly at the middle of the video.

Like Ernest 2.0, Ernest 3.0 has an innate preference for schemas that have an expectation of Y, but in addition, he has en innate dislike for schemas that have an expectation of X. That is, if Ernest has a schema that matches the current context and has an expectation of X, then he will avoid redo the same action.

As we can see in the trace, compared to Ernest 2.0, this conjunction of "like" and "dislike" drastically speeds up the learning process. In this example, after the fourth round, Ernest always manages to get a Y until the environment toggles.

In this trace, schemas are represented as a quintuple (ca, ce, a, e, w). They are the same as Ernest2.0's, but in addition, they are weighted: ca is the context of the previous action, ce is the context of the previous response from the environment, a is the action and e si the expectation of the schema. w is the schema's weight, that is, the number of time the schema was reinforced.

Like Ernest 2.0, after receiving the response from the environment, Ernest 3.0 memorizes the schema, if it is not already known. In addition, if the schema is already known, Ernest reinforces it by incrementing its weight.

For choosing an action, Ernest recalls all the schemas that match the current context, he compares the sums of their weights for each action, counting negatively the weights of schemas having an expectation of X, then he chooses the action with the highest sum. If they are equal, he chooses randomly. For example, in the last decision of this trace, the context is BY and there are three matching schemas: (BY B X 1) (BY A Y 3) and (BY A X 5). That means, in this context, there was one bad experience of choosing B, three good experiences of choosing A, and five bad experiences of choosing A. Thus, Ernest chooses to do B because w(B) = -1 > w(A) = 3-5 = -2.

At the middle of the video, when the environment toggles, the previously learned schemas do not meet their expectations anymore, and they get a X response instead of a Y. That results into the learning of new schemas with an expectation of X. When these new "X" schemas reach a higher weight than the "Y" ones, then the wrong action is not chosen anymore. That means, Ernest becomes more "afraid" of getting X than "confident" of getting Y, if he does A. Thus, Ernest starts sticking to B and gets Ys again.

Ernest can now adapt to two different environments at least, and can readapt if the environment changes from one of them to the other. He has two adaptation mechanisms: the first is based on the learning of behavioral patterns adapted to the short-term context, and the second on a long-term reinforcement of these behavioral patterns.

Thursday, October 9, 2008

Poor Ernest 2.0

So, Ernest 2.0 can adapt to two different environment: the so called ABAB and BBBB environments. But what if Ernest is put into a ABAB environment that turns into a BBBB environment after a while? Let's call this third environment the AB--ABBB--BB environment.

Again, this is catastrophic. The schemas that were learned in the ABAB situation are no longer working in the BBBB situation. Ernest learns new schemas corresponding to the BBBB situation, but these new schemas are contradictory to the previous ones. As poor Ernest is unable to chose between concurrent schemas, he is irremediably lost when the environment turns to BBBB.

What Ernest needs is a way to loose confidence in schemas that do not meet expectations, and to reinforce confidence in schemas that meet expectations. More generally, Ernest needs a way to attribute a confidence value to his schemas, because in complex environments, he cannot assume them to work all the time.

Fortunately, the newly-released Soar version 9 offers reinforcement learning facilities. So far, Ernest was designed with Herbal and implemented with the Jess rule engine. We now need to implement Ernest with Soar to take advantage of reinforcement learning.

Wednesday, October 1, 2008

Ernest 2.0

As explained by many philosophers and well summurized by (Chaput 2004), to make sense, Ernest's knowledge has to be grounded in his acts. That is, knowldege will take the form of behavioral patterns, which are learned by Ernest during his activity. In psychology, these behavioral patterns are known as Schemas.

We will follow the proposition of Drescher (1991) to implement schemas under the form of triples: (Context, Action, Expectation). Context is a situation where Action can be performed. Expectation is the situation after Action was performed in Context for the first time. Expectation is thus the situation that is expected when the schema is applied again in a similar Context.

Notice that a situation is something sequential. To deal with his new environment, Ernest must manage two sequential situation elements: his previous action (context-a), and the previous environment's answer (context-e).

Ernest learns a new schema after each round. This new schema is added to Ernest's long term memory if it is not already there. So to do, Ernest keeps the current Context and Action in short term memory, until he gets the environment's answer. Both schemas and short term memory are displayed in green in the trace. We can see that Ernest knows more and more schemas as his activity unfolds.

As before, Ernest can do A or B. Here, his environment expects successively A then B. The environment responds Y if this expectation is met, and X if not. Ernest "loves" Y. This love is implemented as an innate preference for Y. That is, if Ernest knows a schema with a context corresponding to the current situation, and with an expectation of Y, he recalls this schema. If not, he acts randomly.

The trace shows that after several rounds, a successful schema has been learnt for any situation. Thus, Ernest becomes able to succeed every time in this "ABAB" environment. Notice that, if placed in the "BBBB" environment that always expects B, Ernest will also learn to succeed every time. Thus, Ernest can now adapt to two different environments, at least.

Reference

Drescher, G. L. (1991). Made-Up Minds, a Constructivist Approach to Artificial Intelligence. Cambridge, MA: MIT Press

Friday, September 19, 2008

Other famous Ernests

Ernest Nagel was among the most important philosophers of science of the twentieth century. He has taken up reflexivity as an issue in science. (Ernest Nagel, Wikipedia; Reflexivity, Wikipedia)

The Ernest's legend

Internal states, in green, descibe what Ernest has in mind. Internal operators, in orange, discribe the operations Ernest mentally computes. Actions, in pink, describe what Ernest acts. Environment, in blue, describe what the environment responds.
Of course, that's a legend, Ernest has no mind. A good reference to explain that is Artificial intelligence meets natural stupidity, Drew McDermott (1976).

Thursday, September 18, 2008

Poor Ernest

So, let's put Ernest in an environment a bit more complex. Now, to get Yees, one must make alternatively A and B. That requires to add some "intelligence" in the environment as well.
Of course, it is catastrophic. Poor Ernest only has a one-round memory and has no capacity to recognize AB regularities.
If the problem of cognition is to find one's happiness by exploiting the environment's regularities, then there is still a big deal of work for Ernest...

Wednesday, September 17, 2008

Ernest, the smallest learning artificial agent in the world


Ernest can only do two things: A or B.
Ernest can only perceive two things: "Yee" or "o-o". That is good, because those are the only two things that his environment can exhibit.
Ernest loves Yees and hates o-os. Will he learn that, in this environment, to get Yees, one must do B?
The answer is in his activity trace. Play it with your speakers on, because Ernest speaks!!
So, Ernest finds his happiness by exploiting the simplest possible regularity in his environment: A makes o-o and B makes Yee.
But what happens if his environment exhibits more complex regularities? That is what we have to study next.

Monday, September 1, 2008

Review of Chaput H. H. 2004

This is a review of Harold H. Chaput's PhD dissertation. I wrote it because I was impressed by his learning theory.

The Constructivist Learning Architecture: A Model of Cognitive Development for Robust Autonomous Robots

Harold Chaput proposes a new learning architecture called CLA: Constructivist Learning Architecture.
He takes inspiration from Piaget's work about constructivist epistemology and his notion of scheme or schema. Piaget’s constructivist epistemology emphasis that knowledge is grounded in action. Piaget proposed the notion of scheme, or schema, as the basic element of knowledge. A schema embeds perception, action and expectations in a single temporal pattern of behavior. Piaget proposed a theory of learning based on a progressive construction of schemas from a basic sensorimotor level to the most abstract level.
The CLA is an implementation of this theory in computer. In this dissertation, Harold Chaput demonstrates how this implementation can account for infants’ natural learning. He also demonstrates how the CLA can be used to build an autonomous robot that actually performs artificial learning.

Concerning its implementation in a robot, this work is based on the previous work of Schema Mechanism (Drescher 1991). Harold Chaput refers to the Schema Mechanism as one of the best implementations of constructivist learning, and as the only known learning system to model constructivism as described by Piaget.
He gives a very precise description of it in section 5.1. He summarizes it as follows:

To summarize, the Schema Mechanism starts with a set of primitive items and primitive actions. It then explores the environment to create a set of sensorimotor schemas. These schemas form the basis of new synthetic items. They also are used in the creation of goal-directed composite actions. Using these techniques, an agent can build a hierarchy of items to describe its environment, and a hierarchy of sensorimotor schemas that are combined into a plan for achieving some goal.

However, Harold Chaput deplores that the Schema Mechanism faces insurmountable scale-up issues. These scale-up issues make it impossible to use for a robot in a real world situation.
So, Chaput presents the CLA as an alternative for implementing the Schema Mechanism. Basically, the schema construction that was made in a deterministic manner by the Schema Mechanism, is made in a probabilistic manner by the CLA. This probabilistic schema construction uses a clustering mechanism called SOM (Self-Organizing Maps) (Kohonen 1997).

SOM is an unsupervised learning system that maps input data into a feature coordinate system, or feature map. In the CLA's implementation, patterns of behavior constitute the SOM's input data. The SOM is used to cluster them into a set of schemas. Along training, the SOM will organize itself as a network of nodes, where each node represents a prototype vector of the input. Harold Chaput has chosen the SOM because of its neural plausibility. He notices that other methods for vector clustering could be used as well. As an alternative, he cites the Independent Component Analysis (ICA; Hyvärinen, Karhunen & Oja 2001).
CLA uses several layers of SOM. Each layer takes the level below it as an input. This hierarchical architecture allows the robot to learn increasingly abstract schemas.

Chaput shows how the CLA implements the same functionalities as the Schema Mechanism, but without facing the scale-up limitation.
He illustrates it in a very precise way in section 6.2. This section describes an experiment made with a simulator of a robot, involved in a foraging task. Chaput demonstrates how different levels of schemas are learned, from low-level sensorimotors schemas, to more abstract description of strategies. He demonstrates fallback mechanisms to lower-level schemas, when higher-level schemas are failing. He also demonstrates recovery mechanisms, when the robot is damaged.

Finally, he describes a mechanism of reinforcement learning with delayed feedback, that was implemented to provide the robot with goals. A high reinforcement value was attributed to the final schema that completes the foraging task. A mechanism of spreading reinforcement value along prerequisite schemas was implemented. This caused the robot to organize its behavior according to the task.

In conclusion, I can only regret that this work was only validated in a robot simulator, but not in a real robot in a real environment. Chaput is listing future works aiming at validating it in a real world environment. I imagine that could raise new difficulties, for example the problem that schemas might need to be more "continuous", in opposition to the "discrete symbolic schemas", on which this work was focusing. Then would come the problem of linking continuous schemas to discrete schemas. Anyway, I think this work raises facinating research opportunities.

Reference

Chaput, H. H. (2004). The Constructivist Learning Architecture: A Model of Cognitive Development for Robust Autonomous Robots. Unpublished doctoral dissertation, The University of Texas, Austin