GTO Wizard AI Crushes GPT-5.3 and Grok 4 in New Poker Benchmark

mrinal-gujare
15 Apr 2026
Mrinal Gujare 15 Apr 2026
Share this article
Or copy link
  • GTO Wizard AI excels in poker with a 19.4 bb/100 win-rate.
  • Surpasses general AI models like GPT-5.3.
  • Utilizes deep reinforcement learning and AIVAT for efficiency.
GTO Wizard
Image Credit: GTO Wizard
GTO Wizard AI has set a new poker benchmark, outperforming general models like GPT-5.3 and Grok 4. Using deep reinforcement learning and AIVAT variance reduction, the specialized agent achieved a 19.4 bb/100 win-rate, proving specialized AI superior to frontier LLMs.

In the rapidly evolving world of artificial intelligence, a common question in the poker industry has emerged: when will AI be good enough to consistently beat its human counterparts?

Humans were first pitted against AI back in 2019, with the first AI to beat humans, Pluribus, besting a team of human players to become the first model to do so. 

Then, just last year, nine AI models battled it out over almost 4,000 hands to find out who was best. While Meta's LLAMA 4 went broke, OpenAI o3 emerged victorious. However, the frontier of poker and artificial intelligence has a new top model: GTO Wizard AI.

The Origins of GTO Wizard AI

The new GTO Wizard AI model is a state-of-the-art poker agent that powers all the site's custom solutions. 

Rather than being built off a general-purpose model, GTO Wizard AI was originally developed as Ruse AI by Canadian programmers Marc-Antoine Provost and Philippe Beardsell. This technology was acquired by GTO Wizard in 2023.

Unlike earlier bots like Slumbot (the 2018 Annual Computer Poker Competition (ACPC) champion), which relied on massive, pre-computed strategies, the GTO Wizard AI model does not store a complete poker strategy before play. 

Instead, it was trained against itself over hundreds of millions of hands, gradually learning which plays led to the highest expected value.

"Through deep reinforcement learning," says GTO Wizard, "GTO Wizard AI considers each particular situation as it arises during play and solves it in real-time, in a matter of seconds."

Dominating the Competition

This approach was vindicated after GTO Wizard AI took on Slumbot in a controlled 150,000-hand match. The outcome was as dramatic as it was surprising: GTO Wizard AI achieved a win-rate of 19.4 bb/100 over the course of the match.

For context, a world-class human professional typically aims for a win rate of 5 bb/100. If the stakes were $50/$100, with 200 hands of heads-up played per hour, GTO Wizard AI would have won $19.4 per hand at an hourly win rate of $3,880.

Frontier LLM Benchmark Results

But this isn't the only model that GTO Wizard AI has taken on and beaten. New benchmark results provide the first standardized comparison between "frontier" Large Language Models (LLMs) and specialized poker agents. 

The data reveals that, while general AI has made massive leaps in reasoning, it still lacks the specific strategic depth required to beat the world’s leading poker solver.
OpenAI GPT-5.3 is the current leader among general models, but it still trails the specialized poker agent significantly. 

Claude Opus 4.6 and Gemini 3.1 Pro show that even high-level general reasoning struggles with No-Limit Hold'em, while Elon Musk's xAI model, Grok 4, currently sits significantly lower on the leaderboard.

Eliminating Variance with AIVAT

How does GTO Wizard know these rankings are accurate and not just a run of hot cards? The benchmark utilizes AIVAT, a sophisticated variance-reduction technology. 

Because poker is naturally high-variance, it usually takes hundreds of thousands of hands to reach a statistically significant conclusion. AIVAT reduces this requirement by 10x, enabling researchers to assess an agent's "luck-adjusted" performance much more efficiently.

Public API and Research Access

GTO Wizard is now providing API access to allow independent developers and researchers to submit their own models for evaluation. This move aims to foster more transparent competition in the AI space. 
Developers can integrate their agents directly into the evaluation platform to compete in real-time. 

The API allows for hand simulation and result retrieval without exposing the solver’s internal capabilities.
To take on GTO Wizard AI, participants must follow a specific line-up of rules:
  • Minimum Hands: 2,500 hands of Heads-Up No-Limit Hold'em.
  • Stack Size: 200bb stacks that reset every hand.
  • Monthly Limit: The API will limit usage to 100,000 hands per month.

As the industry moves toward Heads-Up Pot-Limit Omaha (PLO) benchmarks in the near future, the message from GTO Wizard is clear: the era of "claiming" to be the best is over. Now, you have to prove it on the leaderboard.

Upcoming Events