Leaderboard / categories
Flavour Pairing
Pairing logic, cuisine coherence and balancing dishes.
Ranking
- 1GPT-5.5100.0
- 2Claude Opus 4.8100.0
- 3Qwen 3.5 Plus100.0
- 4Grok 4.3100.0
- 5Claude Fable 5100.0
- 6Gemini 3.1 Pro Preview100.0
- 7DeepSeek V4 Pro100.0
- 8Mistral Large 3100.0
- 9Llama 4 Maverick93.3
- 10Kimi K2.657.8
Question heatmap (public questions only)
| Model | 001 | 002 | 003 | 004 | 005 | 007 | 008 | 009 | 010 | 011 |
|---|---|---|---|---|---|---|---|---|---|---|
| GPT-5.5 | ||||||||||
| Claude Opus 4.8 | ||||||||||
| Qwen 3.5 Plus | ||||||||||
| Grok 4.3 | ||||||||||
| Claude Fable 5 | ||||||||||
| Gemini 3.1 Pro Preview | ||||||||||
| DeepSeek V4 Pro | ||||||||||
| Mistral Large 3 | ||||||||||
| Llama 4 Maverick | ||||||||||
| Kimi K2.6 |
Each cell is one question; deeper colour = higher score. Hover for exact values.