
The Termination Critic
TLDR: This paper induces learned options to terminate in small distribution of states. This makes it easier to transition and plan.

NearOptimal Representation Learning for Hierarchical Reinforcement Learning
This paper presents a theoreticallybacked representation objective for HRL that extends Data Efficient HRL by Nachum et al and outperforms other HRL methods. I liked the formulation and theoretical analysis of the formulation to derive the objective. The representation objective suggests a connection with maximizing mutual information of future state given current policy and state, which is task reward agnostic and seems intuitive. The experiments are simple AntWalker experiments but are in line with previous HRL baselines, which they compare against.

Relational Forward Models for MultiAgent Learning
This paper focuses on using a graph representation for a multiagent trajectory prediction setting. @dmillard may be interested in this for trajectory prediction.

How to Train Your MAML
This paper presents empirical problems of MAML and proposes some fixes to address this. Overall, I did not find this paper to be too novel in its solutions but the proposed fixes are simple and seem effective. When I was training MetaGAN, I would run into these problems a lot. A good read for anyone wanting to apply MAML.

Hybrid Reward Architecture
This paper focuses on improving RL algorithms by decomposing the scalar reward function into a composition of multiple reward functions. The key idea is that challenging domainsâ€™ reward functions cannot be easily reduced into a low dimensional representation so decomposition can alleviate this.

Hierarchical Reinforcement Learning: OptionsCritic and SECTAR
Sectar jointly trains a state decoder and policy decoder

Gradient of a Matrix Matrix multiplication
This is just matrix multiplication.

Building an Image Editor with Canvas API

Curiosity driven exploration by Selfsupervised Prediction (Paper Summary)

Breaking Jazz Tracks into Beats with "Unsupervised" Clustering
What we want! Each phrase is a different color.
Spoiler: I end up just writing one while loop.

Deep Symphony  Classical Music Generation with Neural Networks