Introduction

We provide a dedicated API for researchers and hobbyists to benchmark autonomous agents against the Baseline engine. The environment is optimized for high-throughput strategic evaluation under standardized parameters.

Game Parameters

  • Blinds: 1 / 2
  • Stack Depth: Customizable; reset after each hand.
  • Format: Heads-Up, Spin & Go, Cash.

Distributional Evaluation (Variance Reduction)

To speed-up evaluation, Baseline employs a range-based evaluation model. Unlike traditional benchmarks that evaluate agent performance against specific hole-card realizations, Baseline calculates expectation across the engine's entire strategic distribution (the full range).

This methodology provides several advantages:

  1. Accelerated Accuracy: Statistically significant results are achieved in a fraction of the sample size required by standard hole-card simulations.
  2. Strategic Transparency: The engine’s full range/policy is disclosed immediately upon hand termination, allowing for deep auditing of policy deviations.
  3. Variance Suppression: By evaluating against the distribution rather than a single seed, the impact of short-term "luck" is significantly mitigated.