CookingBench

Leaderboard / categories

Flavour Pairing

Pairing logic, cuisine coherence and balancing dishes.

Ranking

  1. 1GPT-5.5
    100.0
  2. 2Claude Opus 4.8
    100.0
  3. 3Qwen 3.5 Plus
    100.0
  4. 4Grok 4.3
    100.0
  5. 5Claude Fable 5
    100.0
  6. 6Gemini 3.1 Pro Preview
    100.0
  7. 7DeepSeek V4 Pro
    100.0
  8. 8Mistral Large 3
    100.0
  9. 9Llama 4 Maverick
    93.3
  10. 10Kimi K2.6
    57.8

Question heatmap (public questions only)

Model001002003004005007008009010011
GPT-5.5
Claude Opus 4.8
Qwen 3.5 Plus
Grok 4.3
Claude Fable 5
Gemini 3.1 Pro Preview
DeepSeek V4 Pro
Mistral Large 3
Llama 4 Maverick
Kimi K2.6

Each cell is one question; deeper colour = higher score. Hover for exact values.