Large Language Models — AI Nexus

Filter: All Architecture Benchmarks Fine-Tuning OpenAI Anthropic Google Open Source Inference

◆ Live Rankings

◆ ◇ ◆

FRONTIER MODEL
Leaderboard

Model

Overall

Reasoning

Coding

Context

Cost/1M

01

GPT-4o

OpenAI · May 2024

94.2

96.1

92.8

128K

$5.00

02

Claude 3.7 Sonnet

Anthropic · Feb 2025

93.8

95.4

96.2

200K

$3.00

03

Gemini 2.0 Ultra

Google · Jan 2025

92.1

93.7

90.4

1M

$7.00

04

Llama 3.3 70B

Meta · Dec 2024 · Open Source

88.6

87.9

91.1

128K

Free

05

Mistral Large 2

Mistral AI · Jul 2024

85.3

84.8

88.2

128K

$2.00

◆ Last updated: June 2025 · Composite score across MMLU, HumanEval, MATH, MT-Bench