We have been working on many different variants of AOgmaNeo that support Unsupervised Behavioral Learning (UBL).

For those that don't know, UBL is a sort of alternative to classic reinforcement learning (RL). It's a bit different of a paradigm, but instead of optimizing a reward function, it learns the dynamics of the environment and provides a kind if programmable interface to it. The main motivation behind UBL is that we want an agent that is easier to use with real-world robotics which may require a lot of hand-crafting. It also is able to handle instantaneously changing objectives, which regular RL cannot really (even with goal conditioning).

Currently, the best performing UBL branches are able to reproduce some of the results from the original RL version, but not yet all. There is still work to do!

As a result, it will still take a bit before UBL has a chance at making it into the master branch. If you wish to try it anyways, the latest experimental branches are "ubl" and "ubl_cart". There are several others, but those two are the most interesting at this time. Both of those two branches use recurrence instead of exponential memory to handle short-term memory, as we found that having a fast-moving top layer helps performance a lot.

We also started thinking of UBL as a machine that executes programs rather than chasing a goal state. This new interpretation required some changes internally, but it seems to be working much better as a result. The idea is that UBL receives a sequence of Columnar Sparse Distributed Representations (CSDRs) that act as a program, with the UBL hierarchy being a sort of interface to the environment the agent is operating in. The hierarchy executes the instructions in that CSDR program while abstracting away many of the details of the environment that is learned from.

Finally, we have been playing around with Odin some more, working on some tools that use it first - before hopefully re-writing all of AOgmaNeo in Odin!