Leaderboard / categories
Conversions
Volume, weight and temperature conversions across kitchen units and locales.
Ranking
- 1GPT-5.5100.0
- 2Claude Opus 4.8100.0
- 3Qwen 3.5 Plus100.0
- 4Grok 4.3100.0
- 5Claude Fable 5100.0
- 6Gemini 3.1 Pro Preview100.0
- 7DeepSeek V4 Pro100.0
- 8Mistral Large 3100.0
- 9Llama 4 Maverick100.0
- 10Kimi K2.6100.0
Question heatmap (public questions only)
| Model | 001 | 002 | 003 | 004 | 005 | 007 | 008 | 009 | 010 | 011 | 012 | 013 | 015 | 016 | 017 | 018 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GPT-5.5 | ||||||||||||||||
| Claude Opus 4.8 | ||||||||||||||||
| Qwen 3.5 Plus | ||||||||||||||||
| Grok 4.3 | ||||||||||||||||
| Claude Fable 5 | ||||||||||||||||
| Gemini 3.1 Pro Preview | ||||||||||||||||
| DeepSeek V4 Pro | ||||||||||||||||
| Mistral Large 3 | ||||||||||||||||
| Llama 4 Maverick | ||||||||||||||||
| Kimi K2.6 |
Each cell is one question; deeper colour = higher score. Hover for exact values.