Jon Michael Galindo

~ writing, programming, art ~

<< Previous next >>
12 December 2017

Zero-Consequence AI Strategy

No matter how I have approached the problem of strategy in AI, it has remained non-computable. By computable, I mean that regardless of the long-term consequences of the AI's choices, its computational resources scale linearly with the number of factors in its environment. Unfortunately, I have consistently found that its computation needs scale both with the number of factors in its environment and exponentially with the distance into the future over which its actions engender consequences.

I suspect this problem is generally insoluble. Long-term strategy really is noncomputable within the confines of a small game.

However, I also have stumbled upon one solution which, while severely limited in application, has so far successfully addressed the issue in tests. I call it zero-consequence.

A simulating AI possesses at any given point an external environment, an internal state defining its goals, and a number of available actions. The problem with any reward function acting on this system is long-term strategy. For example, imagine an in-game AI wanted to collect 2 resources, wood and stone. In the event of similar importance, a reward function might randomly collect one first. However, imagine (in a very contrived scenario) that the closest source of wood was a bridge, and that beyond this bridge lay the only available source of stone. Randomly selecting the wood for collection first would make collecting the stone impossible, while randomly collecting the stone first would have no effect on collecting the wood. In reality, this problem demands strategy: Look ahead at the consequences of both options and select the most rewarding. Unfortunately, as I have said, that look-ahead method is impractical.

Instead, the obvious answer is: Don't make the bridge a collectable source of wood. Or, more generally, never allow the consequences of one action to eliminate the possibility of a subsequent action.

I warned of this design's severely limited applications. For example, no choice can result in death, so AIs implementing this system must be immortal (at least via all consequences of their actions). This means no predator-prey decisions, no forage-to-survive decisions, and no walk-off-a-cliff scenarios. It also means, unfortunately, that AIs capable of modifying their environment must be incapable of obstructing routes (as in the case of the bridge).

Nevertheless, it has applications. For example, an explorer with the ability to instantly return to its point of origin, or a maze-environment with no closed loop opportunities, or a builder capable of erecting only structures with sufficient surrounding space to allow equivalent navigation post-construction. So far, it has found moderately successful use in my AIs, although it necessitates games with a lighter feel: A zero-consequence atmosphere.