Visibility Jam. July 19-21, 2024. In 7 days.

ESR-light and conservative Q-learning

CireNeikual —

Time for another update!

Since the last blog post, we have performed many new experiments with different encoders, decoders, recurrent versions, and flatter hierarchies. Out of these, the best new systems are:

- New encoder - single-byte weights for ESR (exponential sparse reconstruction) encoder using a few re-scaling tricks. Great for Arduino!
- New reinforcement learning decoder that performs conservative Q learning.

The latter in particular is quite nice to have. Previously, we used a type of ACLA algorithm (Actor-Critic Learning Automaton) to perform reinforcement learning. It worked well, but it had some downsides. For instance, the "passive learning" ability of this decoder was basically a hack, as it couldn't properly learn from the rewards it was provided passively, only the actions taken. It also did not function well with epsilon-greedy exploration.

We have tried Q-learning multiple times before, but this time we found the right method of updating the Q values with sparse binary inputs incrementally. We use a combination of advantage learning (increases action gap) along with a simple way of performing conservative Q-learning. We also used N-step Q-learning to help smooth things out.

The conservative Q-learning removes the need to tell the system when it should "mimic" the actions it is given as opposed to learning its own. Instead, it can now learn completely passively and actually make use of the rewards it is provided.

Oh yeah, we also have two new demos since the last post!

In this demo, we trained our Lorcan Mini robot to walk with reinforcement learning using only the IMU forward acceleration as a reward signal:

And in this one, we stored a minute-long video along with its audio approximately in an AOgmaNeo hierarchy:

As I don't know anything about machine learning, I find those posts "hard to read". If you have time at some point, I would appreciate a high level overview of what your project consist of and than lower level details about each systems so that I can try to follow. At the moment I wouldn't even know what to search for to get more information.