StarCraft AI Benchmarks
The goal of this page is to collect benchmark problems that can be broadly used and referenced. We present a benchmark composed of a series of scenarios, each of them capturing a different aspect of RTS games. Each scenario is defined by a starting situation in StarCraft where the agent needs to either defeat the opponent or survive for as long as possible. More information can be found in the original paper: A Benchmark for StarCraft Intelligent Agents.
These benchmarks can be cited as:
@inproceedings{uriarte15b,
author = {Uriarte, Alberto and Onta\~{n}\'{o}n, Santiago},
title = {A Benchmark for StarCraft Intelligent Agents,
booktitle = {AIIDE},
year = {2015}
}
We are preparing an automatic service to test your bot with all the scenarios. For now you can use the launcher provided in the repository (https://bitbucket.org/auriarte/starcraftbenchmarkai) to test your bot locally and send to us (admin[at]starcraftai.com) your bot if you want to appear in the leaderboard. Keep in mind that for micromanager maps the goal for your bot is to reach the opponent's starting position.
Contents
[hide]Metrics
All the metrics are designed to be normalized either in the interval [0,1] or [-1,1], with higher values representing better agent performance.
- Survivor’s life: The sum of the square root of hit points remaining of each unit divided by amount of time it took to complete the scenario (win/defeat/timeout), measured in frames.Normalized by an lower and upper bounds. The lower bound is when player A is defeated in the minimum time and without dealing any damage to player B, while the upper bound is the opposite.
- Time survived: The time the agent survived normalized by a predefined timeout.
- Time needed: We start a timer when a certain event happens (e.g., a building is destroyed) and we stop it after a timeout or after a condition is triggered (e.g., the destroyed building is replaced).
- Units lost: The difference in units lost by players A and B. We normalize between [0, 1] by dividing the number of units lost by the maximum units of the player.
Benchmarks Scenarios
All scenarios can be found in the repository
Scenario | Description | Evaluation |
---|---|---|
Reactive Control | ||
RC1: Perfect Kiting | The purpose of this scenario is to test whether the intelligent agent is able to reason about the possibility of exploiting its mobility and range attack against a stronger but slower unit in order to win. In this scenario, a direct frontal attack will result in losing the combat, but via careful maneuvering, it is possible to win without taking any damage. | Survivor’s life |
RC2: Kiting | In this scenario the intelligent agent is at a disadvantage, but using a hit-and-run behavior might suffice to win. The main difference with the previous case is that here, some damage is unavoidable. | Survivor’s life |
RC3: Sustained Kiting | In this case there is no chance to win so we should try to stay alive as much time as possible. A typical example of this behavior is while we are scouting the enemy base. | Time survived in frames since a Zealot starts chasing the SCV normalized by the timeout. |
RC4: Symmetric Armies | In equal conditions (symmetric armies), positioning and target selection are key aspects that can determine a player’s success in a battle. This scenario presents a test with several configurations as a baseline to experiment against basic AI opponents. | Survivor’s life |
Tactics | ||
T1: Dynamic obstacles | This scenario measures how well an agent can navigate when chokepoints are blocked by dynamic obstacles (e.g., neutral buildings). Notice that we are not aiming to bench- mark pathfinding, but high-level navigation. | Time needed |
Strategy | ||
S1: Building placement | This scenario simulates a Zealot rush and is designed to test whether the agent will be able to stop it (intuitively, it seems the only option is to build a wall). | Units lost: (Units player B lost / 4) - (units player A lost / 25). |
S2: Plan Recovery | An agent should adapt on plan failures. This scenario tests if the AI is able to recover from the opponent disrupting its build order. | Time spent to replace a building normalized by the timeout. |
Research papers using these scenarios
- Q-learnings in RTS game's micro-management. Angel Camilo Palacios Garzón. Universitat de Barcelona. 2015
Leaderboard
Bot | RC1 | RC2 | RC3 | RC4 | T1 | S1 | S2 |
---|---|---|---|---|---|---|---|
FreScBot | -0.0879 | -0.1153 | N/A | -0.0022 | N/A | N/A | N/A |
UAlbertaBot | -0.0933 | 0.0422 | N/A | 0.0369 | N/A | -1 | 0.0000 |
Skynet | -0.1087 | 0.1696 | N/A | 0.0706 | N/A | N/A | N/A |
Nova | 0.1111 | N/A | 0.0335 | N/A | 0.0000 | -0.7420 | 0.0000 |