Learning for Dynamics and Controls

I used Reinforcement Learning via Nvidia's Isaac Gym and the open source legged_gym repository to achieve the results in this presentation. Boston Dynamics' Atlas is shown moving in a stable manner using a curriculum learned policy. I learned that reward tuning is a careful dance between letting Atlas figure out how to stand and walk on its own and telling Atlas what to do to stand and walk on its own. For example, telling Atlas what kind of joint configurations I want for a standing position limits Atlas from finding its own joint configurations that could be more efficient and robust. 

I also explored different regressors in this class. These figures show my implementation of estimating the dynamics of a non-linear spring-mass-damper system using Gaussian Processes.  This is especially useful for estimating a model when the amount of data collected is limited due to time or cost of operation, for example. 

This figure shows the true model through the blue silhouette and 50 training points in the black dots. Note the sparse amount of training points used to train the Gaussian Process regressor. 

This figure shows the true model through the blue silhouette and the estimated model through the 500 black dots.

First Visit Constant-α  Monte Carlo control was used in with OpenAI's Frozen Lake gym to control a sprite as it navigated a slippery frozen lake. The sprite had to learn to navigate to the present without falling into the frozen lake.  Since the lake was slippery, the sprite had a 1/3 chance of moving in the correct direction and a 2/3 chance of moving in a direction perpendicular to the correct direction. For example, if the sprite wants to move left, the sprite has a 1/3 chance of moving left, a 1/3 chance of moving up, and a 1/3 chance of moving down.

A rendering of the sprite at less than 100 learning iterations.

The final learned policy is shown above. The arrows show the direction where the sprite will move given the cell number. The empty cells correspond to locations of thin ice and the goal in the bottom right cell.

An interesting note about the learned policy is that the sprite had learned to move towards the thin ice, which is shown with the learned policy arrows pointing towards the cells with the thin ice. This is because the sprite had learned that it had a higher chance, 2/3, of not moving towards the thin ice if it chose to move towards the thin ice. 

A link to my implementation in Google Colab is below:

Link