ESR-light and conservative Q-learning

Hello,

Time for another update!

Since the last blog post, we have performed many new experiments with different encoders, decoders, recurrent versions, and flatter hierarchies. Out of these, the best new systems are:

- New encoder - single-byte weights for ESR (exponential sparse reconstruction) encoder using a few re-scaling tricks. Great for Arduino!
- New reinforcement learning decoder that performs conservative Q learning.

The latter in particular is quite nice to have. Previously, we used a type of ACLA algorithm (Actor-Critic Learning Automaton) to perform reinforcement learning. It worked well, but it had some downsides. For instance, the "passive learning" ability of this decoder was basically a hack, as it couldn't properly learn from the rewards it was provided passively, only the actions taken. It also did not function well with epsilon-greedy exploration.

We have tried Q-learning multiple times before, but this time we found the right method of updating the Q values with sparse binary inputs incrementally. We use a combination of advantage learning (increases action gap) along with a simple way of performing conservative Q-learning. We also used N-step Q-learning to help smooth things out.

The conservative Q-learning removes the need to tell the system when it should "mimic" the actions it is given as opposed to learning its own. Instead, it can now learn completely passively and actually make use of the rewards it is provided.

Oh yeah, we also have two new demos since the last post!

In this demo, we trained our Lorcan Mini robot to walk with reinforcement learning using only the IMU forward acceleration as a reward signal:

And in this one, we stored a minute-long video along with its audio approximately in an AOgmaNeo hierarchy:

ESR-light and conservative Q-learning

Comments