Olivier Georgeon's research blog—also known as the story of little Ernest, the developmental agent.

Keywords: situated cognition, constructivist learning, intrinsic motivation, bottom-up self-programming, individuation, theory of enaction, developmental learning, artificial sense-making, biologically inspired cognitive architectures, agnostic agents (without ontological assumptions about the environment).

Friday, January 7, 2011

The tangential strategy learning process

To get a better view on how Ernest learned the tangential strategy, let us examine his activity trace:

1 2(> |+) 3(> |+) 4(> |+) 5(> |+) 6(> |+) 7(> |o) 8(v |*) 9(v*|o) 10(>+| ) 11(^ |*) 12(^o| ) 13(^ |o) 14(^*| ) 15(>o| ) 16(>) 17(>) 18(^*| ) 19(vo| ) 20(v) 21(^) 22(>) 23(^*| ) 24(^o|*) 25(^ |o) 26(^) 27(v) 28(v |*) 29(^ |o) 30(^) 31(>) 32(^*| ) 33(^o|*) 34(v*|o) 35(>+| ) 36(^o|*) 37(v*|o) 38(>+| ) 39(^ |*) 40(v |o) 41(>o| ) 42(>) 43(^*| ) 44(^o|*) 45(> |+) 46(> |+) 47(> |o) 48(v |*) 49(v*|o) 50(>+| ) 51(^ |*) 52(>+|+) 53(v |o) 54(vo| ) 55(v |*) 56(v*| ) 57(v |o) 58(^ |*) 59(>+|+) 60(>+|+) 61(>+|+) 62(>x|x) 63(>o|o) 64(v) 65(v) 66(v) 67[v] 68(^) 69[v] 70(v) 71(v) 72(v) 73[v] 74[>] 75(^*| ) 76(^o|*) 77(> |+) 78(> |+) 79(> |+) 80(> |+) 81(> |+) 82(> |o) 83(v |*) 84(v*|o) 85(>+| ) 86(^ |*) 87(>+|+) 88(>x|x) 89(^o|o)

In this trace, the numbers indicate the cycle counter also displayed in the bottom-right corner of the video. The symbols that represent Ernest’s primitive actions read as follows: ^ turn left, > try to move forward, v turn right. These are within parentheses when they succeed and within angle brackets when they fail. For example, Ernest turned toward an adjacent wall on step 73 and bumped a wall on step 74; in all other steps in this trace, primitive schemas succeeded.

The symbols that represent the eye signals read as follows: * appear, + closer, x arrived, o disappear. These symbols are represented on each side of a | character, the left eye signal being on the left and the right eye signal on the right. For example, on step 9, Ernest turned right, the blue square appeared in the left eye’s field and disappeared from the right eye’s field. On step 10, the blue square got closer in the left field and nothing changed in the right field. On step 11, the blue square appeared in the right field and nothing changed in the left field, meaning that the blue square was then present in both eyes’ fields.

The first interesting (safe and satisfying) sequence was found right at the beginning when Ernest moved forward and got closer in the context where he had just moved forward and gotten closer. This experience made him repeat this sequence from step 2 to step 7 when he received a disappear signal from the right eye.

From step 7 to step 11, Ernest learned the returning sequence: step 7: Move forward, disappear on right. 8 : Turn right, appear on right. 9 : Turn right, appear on left, disappear on right. 10: Move forward, closer on left. 11: Turn left, appear on right. After step 11, Ernest is facing the blue square but doesn’t yet know to move forward in this category of context, and he randomly picked a turn action.

On steps 47 through 51, Ernest enacted again the returning sequence because it had proven to work and to be satisfying in the category of context where he finds himself again. On step 52, he choose to move forward (other options had already proven uninteresting in the current category of context), obtaining a closer signal from both eyes. On step 53, however, he does not yet know to continue moving forward in the current category of context and he randomly picks turn right.

On step 59, he again got closer in both eyes when moving forward (although out of a different preceding sequence). In this context, he picked again move forward on step 60, which proved satisfying, engaging him to continue on step 61 until he stepped on the blue square on step 62.

When the second blue square is introduced on step 75, he has thus already learned to enact the different subsequences needed for the tangential strategy, as well as to categorize contexts accordingly. In effect, he uses these different subsequences in the right way until he reaches the second square on step 88.

This quick learning was somewhat lucky but we choose to report it because it led to a clean example of the tangential strategy. In other runs, Ernest may learn mixed strategies that are less prototypical. This run was, however, not so extraordinarily lucky because behaviors are not picked randomly but rather always exploit what has been learned thus far. Chance is only used to untie conflicting impulses when they cannot be untied from previous knowledge.

Experience shows that Ernest always learns a strategy within the first hundred steps, and that the most frequently found strategy is the diagonal strategy.

The tangential strategy

Unable to display content. Adobe Flash is required.
In this example video, Ernest 8.2 found a strategy that we named the tangential strategy.

The tangential strategy consists of approaching the blue square in a straight line as opposed to a diagonal line (the diagonal strategy in the previous example). The trick with the tangential strategy is that Ernest cannot know when he should turn toward the blue square until he passed it. The tangential strategy thus consists of moving on a straight line until the blue square disappears from the visual field, then returning one step backward, and then turning toward the blue square.

The emergence of a specific strategy occurs during Ernest's youth while he his babbling relatively randomly, in parallel to the emergence of goals. See the details in the next post. When Ernest has organized behavioral patterns that proved both satisfying and robust, he adopts them and stick to them as long as they work.

These results demonstrate that:
a) Ernest does not encode strategies nor task procedures defined by the programmer, as opposed to traditional cognitive modeling.
b) Ernest instances are capable of "individuating" themselves through their experience, i.e., acquiring their own cognitive individuality that was not encoded in their "genes". This accounts for the role of individual experience in cognitive development.
c) Ernest's goals emerge from his low-level drives. Eating blue squares appears to the observer as becoming the goal of Ernest' life while no representation of such goal was encoded into Ernest. Indeed, Ernest was given a high incentive to step on blue squares but this incentive was not different in nature from other primitive drives. Ernest's goals were not pre-encoded as they are, for example, in the goal buffer of the ACT-R architecture.

Wednesday, January 5, 2011

Ernest 8.2 can find his food

Unable to display content. Adobe Flash is required.
Ernest 8.2 is a horseshoe crab. Horseshoe crabs are archaic arthropods whose visual system has been extensively studied. From these studies, we pulled several principles that guided the development of Ernest's distal sensory system:

- Small matrix resolution: the horseshoe crab's most elaborated eyes (two compound eyes among the 10 eyes that horseshoe crabs possess) have a resolution of roughly 40*25 pixels.
- Fixed eyes: eyes are fixed to the animal's body. The animal has to rotate its full body to move its visual field.
- Sensibility to movement: the signal sent to the brain does not reflect static shape recognition but rather reflects changes in the visual field.
- Visio-spatial behavioral proclivity: male horseshoe crabs move toward females when they see them with their compound eyes whereas females move away from other females.

As noted earlier, Ernest's "eyes" have only one pixel — pixel sensible to the distance to the blue square in a 90° visual field (assuming there is only one blue square).

Each eye produces a signal that represents the change in the corresponding visual field during the last interaction cycle:
- Appear: a blue square appeared in the visual field.
- Closer: more blue in the visual field, meaning the blue square is approaching.
- Arrived: the blue square occupies the entire visual field, meaning Ernest is stepping on the blue square and can eat it.
- Disappeared: the blue square disappeared from the visual field.

As opposed to previous versions, Ernest 8.2 has no antenna and has only three possible primitive behaviors:
- [move forward, succeed, 0] Ernest is indifferent of moving forward.
- [move forward, fail, -8] Ernest hates bumping walls.
- [turn left or right, succeed, 0] Ernest is indifferent of turning toward an adjacent empty square.
- [turn left or right, fail, -5] Ernest dislikes turning toward an adjacent wall.

To generate a visio-spatial behavioral proclivity, Ernest's sequential learning mechanism receives an additional inborn intrinsic satisfaction when an eye returns a signal:
- [Appear, 15] Ernest loves blue squares appearing in an eye's visual field.
- [Closer, 10] Ernest enjoys blue squares getting closer.
- [Arrived, 30] Ernest is crazy about stepping on a blue square (and eating it in the process).
- [Disappear, -15] Ernest hates blue squares disappearing from an eye's visual field.

At the beginning, the video shows Ernest learning to coordinate his actions with his sensory input. As before, he needs to learn to generate expectations associated with actions (e.g., turning schemas may shift the blue square from one eye to the other, etc.). He also needs to learn sequences of behavior (or "strategies") to reach the blues square. In this example run, he learned a strategy consisting of following a diagonal, and subsequently a straight line. Other strategies are possible that we will report next.