Llama 4 Review: Meta's Free Model Just Closed the Gap with GPT-5

Meta released Llama 4 as open weights. Benchmarks suggest it rivals GPT-5 on most tasks. We tested it for a week. Here's the truth.

Diana Park

June 3, 2026 · 8 min read

Llama 4 Review: Meta's Free Model Just Closed the Gap with GPT-5

Llama 4 is Meta's most ambitious open-source release ever. The flagship 405B parameter model is freely downloadable, and Meta claims it matches GPT-5 on most benchmarks. Is the hype real?

Llama 4 ships in four sizes: Scout (8B), Medium (70B), Maverick (250B Mixture-of-Experts), and Behemoth (405B dense). All are released under a permissive license that allows commercial use for companies under 700M monthly active users.

Advertisement — In Article

Benchmarks vs GPT-5

Llama 4 Behemoth matches GPT-5 on MMLU and HumanEval, beats it on GSM8K math, loses on multilingual reasoning, and is roughly tied on coding benchmarks.

Real-world testing

we ran Behemoth via together.ai on a week of our typical prompts. Quality is genuinely close to GPT-5. The biggest gap is in following complex multi-step instructions, where GPT-5 still wins.

Why open weights matter

any company can self-host Llama 4 with full data control, zero vendor lock-in, no per-token pricing. For regulated industries (healthcare, finance, government), this is a game-changer.

Inference cost

Behemoth requires serious GPU infrastructure — 8x H100s for production serving. But hosted providers (Groq, Together, Fireworks) offer it at roughly half the price of GPT-5.

Smaller siblings

Llama 4 Scout (8B) is the new default local-AI model. It runs on a laptop and replaces GPT-4o-mini for most use cases.

The strategic picture

Meta is commoditizing AI to protect its own platforms from Google and OpenAI tax. The winners are everyone else — developers now have a free, near-frontier model to build on.

Llama 4 isn't going to dethrone ChatGPT for consumers. But for businesses and developers, it changes everything.

The Daily Pulse

Get the 5 biggest tech stories in your inbox every morning. Free, no spam, unsubscribe anytime.

Join 50,000+ tech professionals reading every day.

Llama 4 Review: Meta's Free Model Just Closed the Gap with GPT-5

Benchmarks vs GPT-5

Real-world testing

Why open weights matter

Inference cost

Smaller siblings

The strategic picture

Related Stories

GPT-5 Is Here: Everything You Need to Know About OpenAI's Most Powerful Model Yet

Will AI Coding Agents Replace Developers? We Asked 100 Engineers

The 27 Best AI Tools in 2026 (Tested for 90 Days)

ChatGPT vs Claude 4: Which AI Should You Actually Pay For in 2026?

Google Gemini 3 Ultra Review: Has Google Finally Caught Up?

Midjourney vs DALL-E 4 vs Flux 1.1: The Definitive AI Image Generator Comparison

The Daily Pulse