Friday Facts 16: UpDog (Unity+ML)

Unity has a Machine Learning component. Can I teach it to roll over and stand up?

My robots always start as a simulation.  Instead of hand-rolling a physics engine and a machine learning system I decided this time to try to teach a quadruped bot to a gait strategy (aka get up and walk) using Unity mlagents which use Tensorflow.  My scene and all related material is here:

What is Machine Learning?

Machine Learning is a magic box. On one side you feed it some inputs and then give/withold rewards until the other side produces the result you want. The first trick is to design the inputs well. The second is to design the reward well. There may be other tricks after that but I haven’t found them yet. Part of writing this is document what I’m trying. Maybe you have ideas about how to do it better? Talk with me.

The quadruped model

The root Environment lets me put everything in a Prefab which I can then instance to run many bots in parallel. Dog contains Torso which is the root of the skeleton. Each bone of the skeleton is an ArticulatedBody because they most closely resemble the behavior I’d expect to feed to my servos – set desired angle value, cross fingers. 

Each joint has one degree of rotational freedom and some reasonable limits.  I have no idea what joint stiffness would mirror real life.

Each joint contains one Model that has the mesh of the bot, the material, etc.

On the end of each leg is a foot which is ignored by physics. I paint them red when I detect contact with the terrain.

Behavior Parameters set the inputs and outputs of the network. Space Size is the number of inputs. Stacked Vectors stacks recent inputs so that the network has a kind of memory. Actions are the outputs, and Continuous Actions are decimal numbers usually in the range -1…1. Behavior Type is where you can set drive-by-ML or heuristic (aka drive it yourself).

Dog Controller is the script that is collects the Observations – it feeds the ML with sensory input (more on that later). Dog Controller also receives instructions from the network and then applies them – in this case, it sets the Drive Targets of each ArticulatedBody, same as I would do with a servo on a quadruped. Lastly, Dog Controller usually also gives a Reward to the network to teach it.

Decision Requester sets how often the network should make a decision. The value here is in Academy Steps, which isn’t Unity Update ticks, it isn’t frames, it isn’t FixedUpdate physics steps, it’s… something else.


At the start of each episode (learning cycle) I have to reset the position of the quadruped. I’ve found the easiest way is to make a preserve the original Torso and instead use a completely new clone of the entire Torso hierarchy.

Calculating Rewards

I have now experimented with many automatic styles of reward. Mostly my thinking is a variation on these themes:

  • The reward function first gives points for “uprightness” (read torso.transform.up.y approaching 1).
  • Add to that height off ground (max height 3.6, so height/3.6 will never be more than 1).
  • Add to that horizontal movement (maximum XZ speed of 10, then divided by 10, never more than 1).

These three elements are further scaled by a values that can be tweaked from Dog Controller.

I have also tried a manual voting style where the user (me) can click on bots that appear to be doing well to feed them a reward. I also added a button to end all episodes and start over, in case they’re all terrible. Machine Learning Eugenics :S

Results so far

nine bots running in parallel while my CPU gently weeps

So far it has managed to roll over a few times but never figured out how to stand up. It doesn’t even stay rolled over.

Things I know I don’t know


I have no idea if my network is shaped/configured well. Currently I’m using three layers with 128 neurons each.

I have no idea when rewards are applied to the network, so I’m not sure when is best to manually feed them. I don’t know if I should reward continuously while it is running or do as in the codemonkey example (reward and immediately reset).

ArticulatedBody is not great to work with.  GetJointPositions is in radians and SetJointTargets is in degrees? Physically I have no idea what stiffness to set for ArticulatedBody joints. Too high and they can launch the quad off the ground, too low and they don’t have the strength to roll the dog over.

Setting a good Joint Speed in the Dog Controller is also part of the problem. High stiffness and low joint speed (under 0.5 degree per step)

Why do I have to restart tensorboard results graphing script every time I restart the mlagent-learn training tool? my parameters haven’t changed, only the results that are being graphed.

Final thoughts

I’d really like to talk to some passionate and experienced people about training this successfully. You can find me on Discord or Github all the time. If you have specific comments or questions about this article, let me know and I’ll try to address them in an edit.


Friday Facts 15: Unity 2021.3 tips

Unity game development engine can take some getting used to, no matter your background. Here are some tips and tricks I’ve learned recently that might help you. Documenting them will help me later when I google them and find my own article!

Separate UI and game world mouse clicks

I’ll describe the problem and then the solution. I have UI elements built with UIToolkit on top of the things in my Scene and I don’t want a click on the UI to also click through to whatever is behind. I would be weird to click “sign peace treaty” and unintentionally order troops to attack the town in the same move, right?

Here’s a sample UI in a 2D world. I’m making a game based on Populous, RimWorld and Prison Architect. Little people moving around doing their best and I sometimes nudge them.

In the UIToolkit elements are placed with the same rules and webpages with CSS. Here’s the UIToolkit view of my HUD. I’ve highlighted the #Background element, which says all child elements are aligned to the bottom – that’s how I get the buttons way down there.

Elsewhere, with no obvious connection in the system, there’s PlayerControls, an Input Action Asset – a map between human input devices and actions in game. In has an event called Click mapped to things like the game controller X button and the mouse left button. Still further away in my GameController I have the same PlayerControls as a Component where the Click event calls GameController.OnLeftClick.

The code for GameController.OnLeftClick is a stub for now.

// only get here if the click is NOT on the UI, please.
public void OnLeftClick() {

After many hours of searching I found a way to detect when the cursor is over a UI element. It is EventSystem.current.IsPointerOverGameObject(). Unfortunately you can’t put this in an InputAction event or you’ll get a nasty warning message.

Calling IsPointerOverGameObject() from within event processing (such as from InputAction callbacks) will not work as expected; it will query UI state from the last frame
UnityEngine.EventSystems.EventSystem:IsPointerOverGameObject ()

This fix is to put the code is in GameController.Update() and use it to disable the InputEvents when appropriate.

// Update is called once per frame
void Update() {

private void DoNotClickUIAndGameAtSameTime() {
    PlayerInput inputSystem = GetComponent<PlayerInput>();
    if (EventSystem.current.IsPointerOverGameObject()) {
    } else {

I’m a big fan of small methods that do one job each. I hope you are too! It’s much easier to debug.

Ok, so this completely disables all game input when a cursor is over the UI elements. New problem! Remember that #Background element? It fills the entire screen. Everything is UI! Fortunately the fix is easy.

Setting the background elements to Picking Mode: Ignore will let your mouse pass through and poke at your game. Nice!

Oof, this is already so long I’ll save my next tip for a future post.