Introduction
We provide a dedicated API for researchers and hobbyists to benchmark autonomous agents against the Baseline engine. The environment is optimized for high-throughput strategic evaluation under standardized parameters.
Game Parameters
- Blinds: 1 / 2
- Stack Depth: Customizable; reset after each hand.
- Format: Heads-Up, Spin & Go, Cash.
Distributional Evaluation (Variance Reduction)
To speed-up evaluation, Baseline employs a range-based evaluation model. Unlike traditional benchmarks that evaluate agent performance against specific hole-card realizations, Baseline calculates expectation across the engine's entire strategic distribution (the full range).
This methodology provides several advantages:
- Accelerated Accuracy: Statistically significant results are achieved in a fraction of the sample size required by standard hole-card simulations.
- Strategic Transparency: The engine’s full range/policy is disclosed immediately upon hand termination, allowing for deep auditing of policy deviations.
- Variance Suppression: By evaluating against the distribution rather than a single seed, the impact of short-term "luck" is significantly mitigated.