CookingBench

Leaderboard / categories

Conversions

Volume, weight and temperature conversions across kitchen units and locales.

Ranking

  1. 1GPT-5.5
    100.0
  2. 2Claude Opus 4.8
    100.0
  3. 3Qwen 3.5 Plus
    100.0
  4. 4Grok 4.3
    100.0
  5. 5Claude Fable 5
    100.0
  6. 6Gemini 3.1 Pro Preview
    100.0
  7. 7DeepSeek V4 Pro
    100.0
  8. 8Mistral Large 3
    100.0
  9. 9Llama 4 Maverick
    100.0
  10. 10Kimi K2.6
    100.0

Question heatmap (public questions only)

Model001002003004005007008009010011012013015016017018
GPT-5.5
Claude Opus 4.8
Qwen 3.5 Plus
Grok 4.3
Claude Fable 5
Gemini 3.1 Pro Preview
DeepSeek V4 Pro
Mistral Large 3
Llama 4 Maverick
Kimi K2.6

Each cell is one question; deeper colour = higher score. Hover for exact values.