Artifical Developmental Learning: 2009

Friday, October 16, 2009

Iconic visual perception

Ernest 7.4 has a basic iconic visual system implemented. He can now perceive his surrounding environment under the form of an icon made of three parts: [left, forward, right]. These parts correspond to the three squares around Ernest. This icon is displayed in the upper-right corner of the video.

This environment offers 5 different possible icons. Ernest must learn them and simultaneously learn what behavior is the most satisfying in the context of each of them.

This video shows that this learning is pretty fast. With this iconic mechanism, Ernest succeeds much better in this environment than with the sequential learning mechanism only. Indeed, this task is more tailored to visual learning than to sequential learning, because all the information needed to make a choice is easily accessible in the visual field.

In this experiment, the "visual" perception is not implemented as sensory-motor schemas. The visual perception rather follows a second parallel process of learning and recollection. New learned icons are stored in Ernest's iconic memory; and new or recollected icons are activated in Ernest's short-term memory. As an element of Ernest's short-term memory, the currently-percieved icon is part of the context of the new learned schemas.

Ernest 7.4 has three primitive schemas with the following settings:
- [move forward, succeed, 10] Ernest enjoys moving forward.
- [move forward, fail, -10] Ernest dislikes bumping walls.
- [turn left or right, succeed, 0] Ernest is indifferent of turning toward an empty square.
- [turn left or right, fail, -5] Ernest dislikes turning toward a wall.

Two Visual Systems and Two Theories of Perception

Norman, J. (2002). Two Visual Systems and Two Theories of Perception: An Attempt to Reconcile the Constructivist and Ecological Approaches. Behavioral and Brain Sciences, 25 (1), 73–144.

I am reviewing this paper because it gives some directions for implementing Ernest's visual system. In this paper, Norman analyzes the so-called dual-process approach to vision. He describes these two processes as follows:

Purpose:
- Process 1: picks up visual information to generate adapted behavior.
- Process 2: recognizes and identifies objects and events in the environment.

Human neuro-anatomy:
- Process 1: dorsal (occipito-dorsal and parietal, toward bimodal areas vision/motor).
- Process 2: ventral (occipito-ventral, toward bimodal areas vision/audition).

Information theory:
- Process 1: related to Gibson's (1979) ecological cognition theory. Invariants are picked-up in the visual scene to constitute affordances.
- Process 2: related to Helmholtz's constructivist theory of vision. Pictoral clues are perceived to construct/recollect a representation of objects.

Consciousness:
- Process 1: triggers behavior while the subject has no explicit consciousness of why this behavior is triggered.
- Process 2: the subject is conscious of seeing the objects that this process identifies.

How to use it for Ernest?

So far, only process 1 has been implemented in Ernest. Norman's paper suggests that process 2 should be implemented in parallel. Process 2 requires an iconic memory associated with skills for learning new icons and recognizing icons. Process 1 and process 2 should be interconnected. The process of recognizing icons (process 1) may require to trigger information-pick-up behavior (process 2). Moreover, recognized icons should participate to the selection of forthcoming behavior.

References

Gibson, J. J. (1979). The ecological approach to visual perception. Boston: Houghton Mifflin.

Friday, October 9, 2009

Ernest72 Soar

The Soar 9.1 code of Ernest 7.2 is available here. This code can be edited in Eclipse with the Soar plugin (remove the .txt extension).

Thursday, October 8, 2009

Perception/action coordination

In this experiment, Ernest can touch to his right and to his left in addition to to his front. As before, squares flash yellow when Ernest touches them, and red when Ernest bumps into them (see this post for a detailed legend of the video).

Furthermore, the turn schemas now have feedback status. These status are:
- S, if Ernest turns toward an empty square.
- F, if Ernest turns toward a wall. The wall flashes pink to indicate that Ernest has rubbed it while turning.
In both cases, the turn action is performed.

As before, when Ernest starts, he does not know the linkages between the different schemas.

This experiment shows that Ernest is able to learn to consider certain schemas as perceptions and other as actions, and to learn the linkage between these perceptions and these actions.

At the beginning of the video, Ernest moves frantically. He keeps bumping and scratching walls. He is unable to predict the consequences of his acts and he keeps expressing his "surprise" by saying "Ho!".

After a while, he learns to touch forward before moving forward, to touch to the right before turning to the right, and to touch to the left before turning to the left.

Then, he learns a habit consisting of touching to the left when he is facing a wall. If the square to the left is empty then he turns left. If the square to the left is a wall then he touches to the right. If the square to the right is empty then he turns right. With this strategy, he manages to go all the way around his environment.

The settings for generating this behavior were not easy to find. They are as follows:

- [touch empty (forward, left, or right), 0] Ernest is indifferent of touching empty squares.
- [sense wall (forward, left, or right), -1] Ernest slightly dislikes sensing walls.
- [move forwar, 10] Ernest enjoys very much moving forward.
- [bump wall, -10] Ernest dislikes very much bumping into walls.
- [turn toward empty square (left or right), 0 ] Ernest is indifferent of turning toward an empty square.
- [turn toward wall (left or right), -5] Ernest dislikes turning toward walls.

Two insights from this experiment:

- The learning is faster if schemas that are associated with high emotions are never forgotten, either positive or negative emotions.

- Without the feedback from the turn schemas, Ernest would have a lot of trouble establishing the linkage between sensing to the sides and turning. He would usually end up finding less elegant regularities.

Monday, October 5, 2009

More complex environment

What happens when Ernest is put in a more complex environment? He still starts learning low-level regularities, then he tries finding higher-level regularities that would continue increase his satisfaction.

In this video, Ernest 7.1 drives a little tank. Ernest 7.1 has anticipations associated with his actions. When the result of his action does not meet his anticipation, he says "Oh!". We can see that he first learns the strategy consisting of sensing before moving forward (although the yellow flashing is sometimes too fast to bee seen in this video).

Once this strategy is learned, Ernest stops bumping into walls and becomes more confident in exploring his environment. We can see him going up the left border by being surprised of sensing empty squares and running into them, because he enjoys it.

In this example, Ernest's second-level strategy consists of turning to the right after facing a wall. At the end of the video, this habit gets him trapped in a little loop on two squares. After a few cycles in this loop, he stops being surprised of the result of his acts and stops saying "Ho!".

Two insights from this experiment:

- Ernest 7.1 now has a forgetting mechanism. At the end of each cycle, the weight of all the schemas that have not been enacted during this cycle is decremented of 0.2. When a schema reaches a weight of 0 it is deleted from memory. Forgetting useless schemas helps prevent the continual accumulation of schemas and rather keeps their number stable. This forgetting mechanism actually accelerates the learning because it helps get rid of useless schemas.

- Ernest is tailored for sequential regularity learning but not for spatial regularity learning. Therefore, he has troubles exploiting the possibilities offered by a freedom of movement in a two-dimensional environment. Future developments should provide him with mechanisms to capture spatial regularities, through constructing an internal representation of his surrounding environment.

Thursday, August 6, 2009

Ernest 7.0 can dream

As said before, when Ernest finds a satisfying solution, he keeps repeating it without searching for a better one. This repetition occurs when Ernest enacts a higher-level schema that consists of enacting a subschema in a context where this same subschema has just been enacted.

Ernest 7.0 detects such a situation which causes him to get "bored" and to "fall asleep".

For Ernest, sleeping is the same as being awake, except that his actions are inhibited and not sent to the environment. Ernest merely "dreams" their results based on what he has learned.

In this video, when he falls asleep, he says "Bored, now dreaming", he stops moving, and he starts speaking his dream aloud.

We can hear that he is dreaming of the sequence S2-S4-S3-S2-S3 (sense - turn right - move forward - sense - move forward). Of course, he is continuing the same sequence that got him bored! Poor Ernest, he is dreaming of keeping turning around! This is more like a nightmare!

What can that be used for? Well, maybe dreaming is a first step before thinking?

Thursday, July 9, 2009

ERNEST

EVOLUTIONIST: A "trial and error" method that keeps what works.
PRAGMATIC: Knowledge is used to fulfill goals and satisfactions: "Meaning is use".
SELF-ORIENTED: An "unsupervised learning" that may however use pedagogical situations.
LEARNING: A knowledge acquisition that participates to the agent's development.
CONSTRUCTIVIST: as opposed to "Platonist", knowledge is not "discovered" but "constructed".
BOTTOM-UP: higher-level goals are constructed to better fulfill lower-level inborn satisfactions.

Wednesday, July 8, 2009

Ernest 6.4

Ernest 6.4 is the same as Ernest 6.3 except that he has four elementary schemas: move forward, turn right, turn left, and sense. Move forward succeeds if there is no wall ahead and Ernest can move, it fails if he bumps into a wall. Turn left or right never fail. Sense succeeds if there is a wall ahead and fails if there is not. Ernest 6.4 can only sense the square just in front of him, we can figure this sense as an antenna.

Ernest also has six elementary acts with the following satisfaction values:
- (Move forward, Succeed, 5) He "loves" moving forward.
- (Move forward, Fail, -5) He "hates" bumping into walls.
- (Turn left or turn right, Succeed, -1) He does not "like" so much turning.
- (Sense, Succeed, -1) He does not "like" so much sensing a wall in front of him.
- (Sense, Fail, 0) He is "indifferent" of sensing an empty square in front of him.

In this environment, when Ernest performs a sense schema, the sensed square flashes yellow. When he bumps into a wall, the bumped wall flashes red and Ernest says "Ouch!".

At the beginning, Ernest does not know the connection between the sense schemas and the move schemas. But he progressively learns that when a sense schema succeeds he should not perform a move forward schema because it will fail. Then, he learns a good second-order schema consisting of performing a sense schema before moving forward. This second-order schema can be seen as an efficient "strategy" to avoid dissatisfaction.

When an abstract schema fails during its enaction, Ernest sais "No!" and abandon the schema. When an abstract schema fully succeeds he says "Good!". So "No"s and "Good"s indicate that abstract schemas start being enacted.

In this video, we can see that Ernest first learns a first layer of abstract schemas made of the sequence sense - move forward, then he learns a second layer that gives him the best satisfaction he can get in this environment by moving forward twice and turning once.

Interestingly, Ernest does not make any distinction between perception and action. He only has perceivomotor schemas that can succeed or fail. I had to reprogram his environment to handle this mechanism, because current available environments (such as those provided in the Soar package) are based on the classical perception/computation/action cycle.

Despite many authors are saying that perception and action should be kept embedded (since Piaget or before), to my knowledge, Ernest is the first implementation to do so, isn't it?

Friday, July 3, 2009

Ernest 6.3

Now, Ernest 6.3 has a full recursive hierarchical schema mechanism implemented.

It explicitly implements the notion of "act", that is defined as a triple (schema, status, satisfaction). For example, the second line of the trace indicates that act A1 has been enacted, meaning that schema S2 was enacted with a resulting status of F (Fail) corresponding to a satisfaction of -1.

The two elementary schemas are now called S2 (doing A) and S3 (doing B). Ernest is initialized with four elementary acts: A1=(S2,F,-1), A2=(S2,S,1), A3=(S3,S,1), and A4=(S3,F,-1).

Now, a schema is defined as a triple (context act, intention act, weight). For example, the third line indicates that the schema S4 was constructed with the context act A2, the intention act A1, and the weight 1 (the context is arbitrarily initialized in Ernest's short-term memory as A2). Thus, S4 expects S2 to fail in a context where S2 has succeeded.

The third line also indicates that an act A5 was constructed. A5=S4S0 means that S4 has succeeded and has a satisfaction value of 0. The satisfaction value of an act is computed as the sum of the satisfaction values of its schema's context and intention. Satisfaction(A5) = satisfaction(A2) + satisfaction(A1) = 1 - 1 = 0.

The context is the list of the acts that has just been enacted. Lines 4 and 5 indicate that after the first cycle, the context is made of A1 and A5.

In this trace, we can see that after a while, Ernest stabilizes on the sequences S2F-S2S-S3F-S3S. Then he aggregates it as S8S-S16S. Then he aggregates this sequence as S24 = ( A9=S8S , A17=S16S , 1 ) and keeps on enacting S24S. Then, he learns S112 = ( A25=S24S , A25=S24S , 1 ) and he keeps on exploring until he is turned off.

Interestingly, to prevent complexity explosion, I made the context not include subschemas that are more than one level below the enacted schema. For instance, when S24 is enacted, the resulting context does not include S3S, despite it has also been enacted as S24 last sub-sub-schema. This can be understood as Ernest being only "aware" of the top-level enacted schemas. The lower-level schemas are enacted "unconsciously" from Ernest's viewpoint, unless they fail, in which case they would pop-up again in the context.

I think this recursive mechanism makes Ernest virtually able to learn any regularity in his environment, which excites me a lot. The learning time would probably explode when the regularity gets arbitrarily complex, but that's ok because we only expect cognitive agents to learn regularities that are hierarchically structured.

Friday, June 12, 2009

Ernest 6.2

Ernest 6.2 only enacts schemas when they attain a certain level of reinforcement (set to 7 in this example).

At the beginning, only schemas A1 and B1 are enacted.

Very soon, the sequence B1F-B1S is aggregated into S5, and the sequence A1F-A1S is aggregated into S9, but Ernest has not yet found the stable solution based on these schemas.

Almost at the middle of the trace, Ernest finds the solution by stabilizing the sequence A1F-A1S-B1F-B1S.

From then, S5 and S9 get faster reinforcement until S5 reaches the reinforcement of 7. Then S5 starts being enacted and Ernest searches a new equilibrium until he finds the sequence A1F-A1S-S5S.

Then, S9 reaches a reinforcement of 7 and starts being enacted too. So, Ernest finally finds the sequence S5S-S9S to solve the aA..bB task.

Ernest 6.2 borrows the two mechanisms of Ernest 6.0 and 6.1. Like Ernest 6.0, he starts searching a solution based on lower-level schemas until they get frozen. Then, like Ernest 6.1, he starts enacting higher-level schemas. With this mechanism, Ernest uses what he knows about lower-level schemas to find the correct sequence of higher-level schemas before effectively starting to enact them.

Once the higher-level schemas are stabilized, we can consider them as base-level schemas and continue learning higher-level schemas on top of them.

Wednesday, May 27, 2009

Ernest 6.0 schema construction mechanisme

This figure details how schemas are constructed in the two first cycles.

At the level 0, elementary schemas are represented as circles. They are named A (schema A1 = doing A) and B (schema B1 = doing B). They are green if they succeed and red if they fail.

First level and second level schemas are represented as arrows. Their context is on the left side (dot), and their intention is on the right side (arrow). For instance S13=(AS,BF) and S21=(S13S,AF) (S=succeed, F=fail).

At the beginning, the context is initialized to AS. In this context, at this time, Ernest has no preference, so he randomly picks B. The environment returns Fail, so Ernest reinforces S13.

At the beginning of the second cycle, the context now has two levels: BF and S13S. At this time, and in this context, Ernest has no preference so he randomly picks A. The environment returns fail, so Ernest reinforces S16. In addition, because he also had S13S as context, Ernest considers that a schema (S13S,AF) has also been enacted. This schema does not yet exist so he creates it and names it S21.

Moreover, because S16S has been enacted from an element that is still in the context (AS = context of schema S13) then Ernest considers that a schema (AS, S16S) has also been enacted. This schema does not yet exist so he creates it and names it S22.

At the end of cycle 2, Enest has thus reinforced two first-level schemas and constructed two second-level schemas. The context is now made of schemas AF, S16S, S21S and S22S.

If we continued with this principle, Enest would construct third-level schemas on the third cycle, fourth-level schemas on the fourth cycle, and so on. This means that at each new cycle, the context would be a structure representing Ernest's whole life up to this point. For scalability reason, we cannot handle such a complexity, thus, in this experiment, we forbid higher-level schema construction above level two.

These two levels of schemas are enough for Ernest to succeed in the aA..bB task because these schemas memory span is equal to this task regularity span.

In this experiment, first-level schemas and second-level schemas are only used to propose elementary schemas, but they are not themselves enacted. To go ahead, we need to implement these schemas enaction.

Tuesday, May 19, 2009

Ernest 6.0 and the aA..bB environment

This video shows Ernest 6.0 in the aA..bB environment that I have previously explained here. In this environment, Ernest has to do two consecutive As or two consecutive Bs to succeed.

At the beginning, Ernest is initialized with two possible acts: A1 (doing A) and B1 (doing B). Their succeeding satisfaction is equal to 1 (getting Y), and their failing satisfaction is equal to -1 (getting X). In addition, Ernest is initialized with 16 primary schemas S3 to S18 made by combination of the acts A1 and B1 with their succeeding or failing status (yellow lines), for instance S3=( A1 S, A1 S, 2, 0). The satisfaction of S3 is equal to satisfaction(A1 S) + satisfaction(A1 S) = 2. The weight of S3 is 0 because it has not yet been enacted.

Ernest's short term memory is initialized with S1 S, as if schema S1 had been enacted and succeeded. At the first cycle, four schemas match this context: S3, S11, S5, S13 (four first pale green lines) but all their propositions are weighted 0, so the schema B1 is randomly picked.

So, Ernest does B but he receives a fail status from the environment.

Then, the new situation is assessed. This assessment basically consists of expressing the situation in terms of schemas. If these schemas already exist then they are reinforced, if they do not exist then they are constructed with a weight of 1. At the end of the first cycle, this assessment leads to reinforcing S13 and to setting the new context as containing B1 F and S13 S, which now forms two levels of context.

After the second cycle, we can see that two second-order schemas are constructed: S21 and S22 that are based on the two levels of the previous context B1 F and S13 S.

These second-order schemas cover two rounds, and when they will match the context, they will influence the selection in favor of primary schemas that will lead to success one round later. That is how Ernest can finally learn to succeed in this environment.

Notice that After each cycle, schemas of all levels that match the resulting situation are reinforced. This leads to a problem because higher-level schemas tend to force lower-level schemas that fail, which is good, but then these lower-level schemas will be reinforced which will tend to reject their acts in return, because they have a negative satisfaction.

To avoid this problem, I have limited the maximum weight a schema can receive to 5. When a schema has reached a weight of 5, it becomes frozen. At the end of this trace, we can see that Ernest reaches a stable activity that solves the aA..bB task, based on frozen schemas.

Basically, it means that the learning of higher-level schemas is made possible by the progressive freezing of lower-level schemas.

Enrest 6.0

To prepare Ernest to recursively learn schemas on top of one another, I have almost entirely rewritten it.

Now, schemas are no longer triples of subschemas but couples of subschemas. The first subschema of a schema is its context, and the second is its intention. For instance, S3=(S1,S2) means that the schema S3 intents to enact S2 in a context where S1 has been enacted. In addition, subschemas are associated with their succeed or failure status: S=Succeed or F=Fail. So, for instance, S3=(S1 S, S2 S) means that S3 expects S2 to succeed in a context where S1 has succeeded. On contrary, S4=(S1 S, S2 F) would expect S2 to fail in a context where S1 has succeeded.

Like before, schemas also have satisfaction values and weights. So, S3=(S1 S, S2 S, 2, 1) means that schema S3 has a satisfaction of 2 and a weight of 1. The satisfaction of a schema is the sum of the satisfactions of his subschemas for their specific status. For instance S1 and S2 may both have a satisfaction of 1 when they succeed, so S3 has a satisfaction of 1+1 = 2. If S2 has a satisfaction of -1 when it fails, then S4 would have a satisfactin of 1-1 = 0. The weight of a schema is the number of times the schema has been enacted. For instance, if S2 has failed 3 times in a context where S1 has succeed, then we have S4=(S1 S, S2 F, 0, 3)

At each cycle, all the schemas whose context match the current context propose their intention. The proposition weight is equal to the intended subschema satisfaction multiplied by the proposing schema weight. This can be understood as the benefit of doing it multiplied by the confidence to succeed. For instance, in a context where S1 has succeeded, S3 proposes S2 with a proposition weight equal to satisfaction(S2 S)*weight(S3) = 1*1 = 1. In the same context, S4 proposes S2 with a proposition weight equal to satisfaction(S2 F)*weight(S4) = -1*3 = -3.

Then, the proposition weights of each schemas are summed up and the schema with the highest sum is selected and enacted. In our exemple, S2 has a total proposition weight equal to 1-3 = -2. Negative propositions can be understood as "fear" for getting unsatisfaction. In this case, Ernest is "afraid" of doing S2 because he had more bad experiences of doing it than good experiences in this context. He will only do it if he has no more appealing choice.

When the selected schema has been enacted, the environment returns its succeed or fail status. Based on this status, the new context is assessed and new schemas are learned or reinforced.

Tuesday, February 24, 2009

Ernest 5.0

If we want Ernest to exhibit a more adventurous behavior, we need him to prefer actions that make him move to actions that keep him at the same place. So to do, we need him to have some sense of movement.

So, Ernest 5.0 can now perceive three things after each primary action: "bumped", "moved", or "turned". Each of these primary perceptions has a satisfaction value: val(bumped) = -1, val(moved) = 1, val(turned) = 0.

Schemas are also associated with satisfaction values when they are constructed. The satisfaction value of a primary schema is equal to the satisfaction value of its expected primary perception. The satisfaction value of a secondary schema is the sum of the satisfaction values of its subschemas.

The schema preferences are now computed as pref = weight * value. weight is the number of times this schema has been successfully enacted and value is the satisfaction value of this schema. As before, all schemas compete and the one with the highest preference is selected.

Although this approach might seem similar to classical "reward mechanism" approaches, it is actually different. Classical approaches consist of giving a reward for reaching a high-level goal, and having the agent backward-propagate this reward to prior operations that led to this goal (Laird & Congdon, 2008). In contrary, satisfaction values are defined at the lowest level, and Ernest constructs higher-level goals that let him better fulfill his lower-level satisfaction. Because of this difference, I cannot use the Soar 9 built-in reward mechanism. I believe that my approach better accounts for natural cognitive agents way of doing.

In this video, we can see that Ernest 5.0 finds a solution based on secondary schema S52 that allows him to keep moving without bumping into walls. When observing his activity, we can infer, if we want, that he does not "like" bumping into walls but that he "likes" moving. Some may even detect some "impatience" of discovering a wider universe!

References

Laird John E., Congdon Claire B., (2008). Part VII-Reinforcement Learning. The Soar User’s Manual Version 9.0. University of Michigan.

Saturday, February 21, 2009

About local optimum

It must be noticed that sometimes, Ernest 4.3 can get stuck in some non-optimal solution. In this video, he finds a solution made of primary schema S14 and secondary schema S30. This solution gives him a Yahoo! when enacting each of these two schemas. This solution, however, is not optimal because it makes Ernest systematically bump each secondary schema S30 first step, which could be avoided in this environment.

This problem relates to the so-called "local optimum" problem: Ernest will not explore other solutions when he has found a satisfying one.

Many authors suggest a stochastic response to this problem, consisting of putting some randomness or "noise" in the agent's behavior. Soar even implements this response by default when reinforcement learning is activated, through the so-called "epsilon-greedy" exploration policy (Laird & Congdon, 2008). This randomness will sometimes cause the agent to perform an action that he does not prefer, in order to make him explore other solutions.

I do not agree with this response because I don't see any sense having the agent not choose his preferred action. That will only impede the learning process. Besides, it has been widely accepted, since Simon (1955), that cognitive agents' goal is not to find the optimum solution but only a satisfying solution.

For Ernest, I would rather implement some mechanism that would make him choose to explore other solutions when he gets "bored". Before that, I could at least reduce this non-optimal-solution risk by computing a better payoff value for each schema.

References

Laird John E., Congdon Claire B., (2008). The Soar User’s Manual Version 9.0. University of Michigan.

Simon, H. (1955). A behavioral model of rational choice. Quaterly Journal of Economics, 69, 99-118.

Friday, February 20, 2009

Ernest 4.3

Ernest 4.3 is the same as Ernest 4.2, except that he can do three different things: go ahead, rotate right, and rotate left. This allows him to explore his new 2-dimension environment. Like Ernest 4.2, he can only perceive two things: bump and non-bump.

See how fast he learns a smart strategy to avoid bumping into walls: keeping spinning on himself. Like before, he is able to learn second order schemas, but in this environment, there is no need to enact them. In this example, keeping doing primary schema S35 gives him a Yahoo! each time.

Interestingly, the number of schemas he constructs per cycle does not grow with the environment complexity. It actually grows with Ernest's own complexity, proportionally to the number of elementary actions he is able to perform multiplied by the number of elementary sensations he is able to perceive. Hence, the complexity remains under control, and there will be no combinatorial explosion when the environment complexity increases.

The time needed to explore the environment would however grow with the environment complexity. This raises the interesting question of Ernest's “education”, that is, designing “pedagogical” situations where he could more easily learn lower-level schemas, on which higher-level schemas could anchor.

Wednesday, February 4, 2009

Reference

A paper presenting this work has just been accepted at the BRIMS conference. The reference is:

Georgeon O. L., Ritter F. E., Haynes S. R. (2009). Modeling Bottom-Up Learning from Activity in Soar. Proceedings of the 18th Annual Conference on Behavior Representation in Modeling and Simulation (BRIMS). Sundance, Utah. March 30 – April 2, 2009. 09-BRIMS-016, pp. 65-72.

Saturday, January 31, 2009

First steps in space

Ernest 4.2 is the same as Ernest 4.1 but he is now connected to a 2D environment.

This environment is the Vacuum cleaner environment initially developed by Cohen (2005). It is a simple grid environment that allows an agent, represented as a vacuum cleaner, to move in search for dust to clean up. I have implemented three new functionalities: (i) Wall representation (dark gray squares) inside the grid. They flash red when the vacuum cleaner bumps into them. (ii) Speaking aloud the agent's rational in real time. (iii) "Bump" sense, that, after each move, returns true if the agent has bumped into a wall and false if he has not.

Ernest is controlling the vacuum cleaner. His output A is mapped to a move command to the left, and output B to the right. His input Y is mapped to a non-bump feedback, and input X to a bump feedback. So far, Ernest only uses one dimension of this 2D environment: the horizontal. There is no dust in this example, and Ernest cannot sense any other information than bump or non-bump.

We can hear the possible construction of new schemas at the beginning of each cycle. Then we see the vacuum cleaner possibly moving or bumping into a wall. Then we hear which schema has been enacted. If he did not bump, he says "yahoo", if he bumped, he says "Hoho".

We see that he begins randomly exploring his environment and constructing primary schemas, then secondary schemas. Eventually, he finds a stable activity based on a secondary schema (S28) that gives him a "yahoo" each time. From observing his activity, we can infer, if we want, that he does not "like" bumping into walls, and that he learns a procedure to avoid that.

Reference

Cohen, M. A. (2005). Teaching agent programming using custom environments and Jess. AISB Quarterly, 120 (Spring), 4.