AI Model Catalog

200+ models benchmarked on MMLU, HumanEval, MATH, MT-Bench, and reasoning evals. Updated monthly.

Model Company Params Context MMLU HumanEval MATH Type
GPT-4o OpenAI ~1.8T (MoE) 128K 88.7 90.2 76.6 Private
Claude 3.5 Sonnet Anthropic Unknown 200K 88.3 92.0 71.1 Private
Gemini 1.5 Ultra Google Unknown 1M 90.0 84.1 58.5 Private
Apple Intelligence (Server) Apple Unknown Unknown N/A N/A N/A Private
Apple Intelligence (On-Device) Apple ~3B 4K 60.9 N/A N/A Private
Llama 3 405B Meta 405B 128K 87.3 89.0 73.8 Open
Llama 3.1 70B Meta 70B 128K 82.6 80.5 68.0 Open
Mistral Large 2 Mistral AI 123B 128K 84.0 92.1 69.9 Open
Grok-2 xAI Unknown 128K 87.5 88.4 76.1 Private
Command R+ Cohere 104B 128K 75.7 69.3 N/A Open
Qwen2.5 72B Alibaba 72B 128K 86.1 86.2 83.1 Open
DeepSeek-V3 DeepSeek 671B (MoE) 128K 88.5 89.9 90.2 Open