AI Model Catalog
200+ models benchmarked on MMLU, HumanEval, MATH, MT-Bench, and reasoning evals. Updated monthly.
| Model | Company | Params | Context | MMLU | HumanEval | MATH | Type |
|---|---|---|---|---|---|---|---|
| GPT-4o | OpenAI | ~1.8T (MoE) | 128K | 88.7 | 90.2 | 76.6 | Private |
| Claude 3.5 Sonnet | Anthropic | Unknown | 200K | 88.3 | 92.0 | 71.1 | Private |
| Gemini 1.5 Ultra | Unknown | 1M | 90.0 | 84.1 | 58.5 | Private | |
| Apple Intelligence (Server) | Apple | Unknown | Unknown | N/A | N/A | N/A | Private |
| Apple Intelligence (On-Device) | Apple | ~3B | 4K | 60.9 | N/A | N/A | Private |
| Llama 3 405B | Meta | 405B | 128K | 87.3 | 89.0 | 73.8 | Open |
| Llama 3.1 70B | Meta | 70B | 128K | 82.6 | 80.5 | 68.0 | Open |
| Mistral Large 2 | Mistral AI | 123B | 128K | 84.0 | 92.1 | 69.9 | Open |
| Grok-2 | xAI | Unknown | 128K | 87.5 | 88.4 | 76.1 | Private |
| Command R+ | Cohere | 104B | 128K | 75.7 | 69.3 | N/A | Open |
| Qwen2.5 72B | Alibaba | 72B | 128K | 86.1 | 86.2 | 83.1 | Open |
| DeepSeek-V3 | DeepSeek | 671B (MoE) | 128K | 88.5 | 89.9 | 90.2 | Open |