Friday Facts 16: UpDog (Unity+ML)

Unity has a Machine Learning component. Can I teach it to roll over and stand up?

My robots always start as a simulation.  Instead of hand-rolling a physics engine and a machine learning system I decided this time to try to teach a quadruped bot to a gait strategy (aka get up and walk) using Unity mlagents which use Tensorflow.  My scene and all related material is here:

What is Machine Learning?

Machine Learning is a magic box. On one side you feed it some inputs and then give/withold rewards until the other side produces the result you want. The first trick is to design the inputs well. The second is to design the reward well. There may be other tricks after that but I haven’t found them yet. Part of writing this is document what I’m trying. Maybe you have ideas about how to do it better? Talk with me.

The quadruped model

The root Environment lets me put everything in a Prefab which I can then instance to run many bots in parallel. Dog contains Torso which is the root of the skeleton. Each bone of the skeleton is an ArticulatedBody because they most closely resemble the behavior I’d expect to feed to my servos – set desired angle value, cross fingers. 

Each joint has one degree of rotational freedom and some reasonable limits.  I have no idea what joint stiffness would mirror real life.

Each joint contains one Model that has the mesh of the bot, the material, etc.

On the end of each leg is a foot which is ignored by physics. I paint them red when I detect contact with the terrain.

Behavior Parameters set the inputs and outputs of the network. Space Size is the number of inputs. Stacked Vectors stacks recent inputs so that the network has a kind of memory. Actions are the outputs, and Continuous Actions are decimal numbers usually in the range -1…1. Behavior Type is where you can set drive-by-ML or heuristic (aka drive it yourself).

Dog Controller is the script that is collects the Observations – it feeds the ML with sensory input (more on that later). Dog Controller also receives instructions from the network and then applies them – in this case, it sets the Drive Targets of each ArticulatedBody, same as I would do with a servo on a quadruped. Lastly, Dog Controller usually also gives a Reward to the network to teach it.

Decision Requester sets how often the network should make a decision. The value here is in Academy Steps, which isn’t Unity Update ticks, it isn’t frames, it isn’t FixedUpdate physics steps, it’s… something else.


At the start of each episode (learning cycle) I have to reset the position of the quadruped. I’ve found the easiest way is to make a preserve the original Torso and instead use a completely new clone of the entire Torso hierarchy.

Calculating Rewards

I have now experimented with many automatic styles of reward. Mostly my thinking is a variation on these themes:

  • The reward function first gives points for “uprightness” (read torso.transform.up.y approaching 1).
  • Add to that height off ground (max height 3.6, so height/3.6 will never be more than 1).
  • Add to that horizontal movement (maximum XZ speed of 10, then divided by 10, never more than 1).

These three elements are further scaled by a values that can be tweaked from Dog Controller.

I have also tried a manual voting style where the user (me) can click on bots that appear to be doing well to feed them a reward. I also added a button to end all episodes and start over, in case they’re all terrible. Machine Learning Eugenics :S

Results so far

nine bots running in parallel while my CPU gently weeps

So far it has managed to roll over a few times but never figured out how to stand up. It doesn’t even stay rolled over.

Things I know I don’t know


I have no idea if my network is shaped/configured well. Currently I’m using three layers with 128 neurons each.

I have no idea when rewards are applied to the network, so I’m not sure when is best to manually feed them. I don’t know if I should reward continuously while it is running or do as in the codemonkey example (reward and immediately reset).

ArticulatedBody is not great to work with.  GetJointPositions is in radians and SetJointTargets is in degrees? Physically I have no idea what stiffness to set for ArticulatedBody joints. Too high and they can launch the quad off the ground, too low and they don’t have the strength to roll the dog over.

Setting a good Joint Speed in the Dog Controller is also part of the problem. High stiffness and low joint speed (under 0.5 degree per step)

Why do I have to restart tensorboard results graphing script every time I restart the mlagent-learn training tool? my parameters haven’t changed, only the results that are being graphed.

Final thoughts

I’d really like to talk to some passionate and experienced people about training this successfully. You can find me on Discord or Github all the time. If you have specific comments or questions about this article, let me know and I’ll try to address them in an edit.