CookingBench

Leaderboard / categories

Substitutions

Ingredient swaps with correct ratios, including allergen-aware alternatives.

Ranking

  1. 1GPT-5.5
    100.0
  2. 2Claude Opus 4.8
    100.0
  3. 3Claude Fable 5
    100.0
  4. 4DeepSeek V4 Pro
    100.0
  5. 5Qwen 3.5 Plus
    96.7
  6. 6Grok 4.3
    96.7
  7. 7Mistral Large 3
    96.7
  8. 8Gemini 3.1 Pro Preview
    93.3
  9. 9Kimi K2.6
    87.8
  10. 10Llama 4 Maverick
    86.7

Question heatmap (public questions only)

Model001002003004005007008009010011013014015
GPT-5.5
Claude Opus 4.8
Claude Fable 5
DeepSeek V4 Pro
Qwen 3.5 Plus
Grok 4.3
Mistral Large 3
Gemini 3.1 Pro Preview
Kimi K2.6
Llama 4 Maverick

Each cell is one question; deeper colour = higher score. Hover for exact values.