Home Categories LLMs AI Tools Research Blog
HomeCategoriesLLMs
◆ ◇ ◆

LARGE LANGUAGE
Models

Architecture deep-dives · Benchmark analysis · Fine-tuning guides · Frontier model comparisons · 482 articles and counting.

Filter: All Architecture Benchmarks Fine-Tuning OpenAI Anthropic Google Open Source Inference
◆ ◇ ◆

FRONTIER MODEL
Leaderboard

Model
Overall
Reasoning
Coding
Context
Cost/1M
01
GPT-4o
OpenAI · May 2024
94.2
96.1
92.8
128K
$5.00
02
Claude 3.7 Sonnet
Anthropic · Feb 2025
93.8
95.4
96.2
200K
$3.00
03
Gemini 2.0 Ultra
Google · Jan 2025
92.1
93.7
90.4
1M
$7.00
04
Llama 3.3 70B
Meta · Dec 2024 · Open Source
88.6
87.9
91.1
128K
Free
05
Mistral Large 2
Mistral AI · Jul 2024
85.3
84.8
88.2
128K
$2.00
◆ Last updated: June 2025 · Composite score across MMLU, HumanEval, MATH, MT-Bench