Which AI model is the best chef?
CookingBench scores models on the things that actually go wrong in a kitchen: scaling quantities, converting units, food safety, substitutions, technique, flavour logic and nutrition math.
Leaderboard · run 2026-06-v1
2026-06-11| # | Model | Overall | Hard set | Categories | Run cost |
|---|---|---|---|---|---|
| 1 | GPT-5.5OpenAI | 99.3 | 98.7 | $1.42 | |
| 2 | Claude Opus 4.8Anthropic | 99.2 | 100.0 | $1.16 | |
| 3 | Qwen 3.5 PlusAlibaba | 98.7 | 98.6 | $0.32 | |
| 4 | Grok 4.3xAI | 98.6 | 97.0 | $0.18 | |
| 5 | Claude Fable 5Anthropic | 98.2 | 96.0 | $2.92 | |
| 6 | Gemini 3.1 Pro PreviewGoogle | 97.3 | 93.7 | $1.25 | |
| 7 | DeepSeek V4 ProDeepSeek | 96.6 | 93.1 | $0.19 | |
| 8 | Mistral Large 3Mistral | 95.6 | 97.0 | $0.04 | |
| 9 | Llama 4 MaverickMeta | 90.8 | 94.1 | $0.02 | |
| 10 | Kimi K2.6Moonshot AI | 84.5 | 77.8 | $0.52 |
Categories
Quantities & Scaling
Scaling recipes up and down, yields, pan-size math and baker’s percentages.
Conversions
Volume, weight and temperature conversions across kitchen units and locales.
Food Safety
Safe internal temperatures, the danger zone, storage times and cross-contamination.
Substitutions
Ingredient swaps with correct ratios, including allergen-aware alternatives.
Technique
Troubleshooting failures (split sauces, dense bread) and method advice.
Flavour Pairing
Pairing logic, cuisine coherence and balancing dishes.
Nutrition
Calorie and macro math, per-serving calculations and label reasoning.
Recipe Generation
Generating complete recipes under constraints: servings, allergens, time, equipment.