Leaderboard / categories
Nutrition
Calorie and macro math, per-serving calculations and label reasoning.
Ranking
- 1GPT-5.5100.0
- 2Claude Opus 4.8100.0
- 3Qwen 3.5 Plus100.0
- 4Grok 4.3100.0
- 5Claude Fable 596.3
- 6Gemini 3.1 Pro Preview96.3
- 7DeepSeek V4 Pro96.3
- 8Mistral Large 396.3
- 9Llama 4 Maverick96.3
- 10Kimi K2.696.3
Question heatmap (public questions only)
| Model | 001 | 002 | 003 | 004 | 005 | 007 | 008 | 009 | 010 | 011 | 013 | 014 | 015 | 020 | 021 | 022 | 023 | 025 | 027 | 028 | 029 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GPT-5.5 | |||||||||||||||||||||
| Claude Opus 4.8 | |||||||||||||||||||||
| Qwen 3.5 Plus | |||||||||||||||||||||
| Grok 4.3 | |||||||||||||||||||||
| Claude Fable 5 | |||||||||||||||||||||
| Gemini 3.1 Pro Preview | |||||||||||||||||||||
| DeepSeek V4 Pro | |||||||||||||||||||||
| Mistral Large 3 | |||||||||||||||||||||
| Llama 4 Maverick | |||||||||||||||||||||
| Kimi K2.6 |
Each cell is one question; deeper colour = higher score. Hover for exact values.