Llama 4 Review: Meta's Free Model Just Closed the Gap with GPT-5
Meta released Llama 4 as open weights. Benchmarks suggest it rivals GPT-5 on most tasks. We tested it for a week. Here's the truth.

Llama 4 is Meta's most ambitious open-source release ever. The flagship 405B parameter model is freely downloadable, and Meta claims it matches GPT-5 on most benchmarks. Is the hype real?
Llama 4 ships in four sizes: Scout (8B), Medium (70B), Maverick (250B Mixture-of-Experts), and Behemoth (405B dense). All are released under a permissive license that allows commercial use for companies under 700M monthly active users.
Benchmarks vs GPT-5
Llama 4 Behemoth matches GPT-5 on MMLU and HumanEval, beats it on GSM8K math, loses on multilingual reasoning, and is roughly tied on coding benchmarks.
Real-world testing
we ran Behemoth via together.ai on a week of our typical prompts. Quality is genuinely close to GPT-5. The biggest gap is in following complex multi-step instructions, where GPT-5 still wins.
Why open weights matter
any company can self-host Llama 4 with full data control, zero vendor lock-in, no per-token pricing. For regulated industries (healthcare, finance, government), this is a game-changer.
Inference cost
Behemoth requires serious GPU infrastructure — 8x H100s for production serving. But hosted providers (Groq, Together, Fireworks) offer it at roughly half the price of GPT-5.
Smaller siblings
Llama 4 Scout (8B) is the new default local-AI model. It runs on a laptop and replaces GPT-4o-mini for most use cases.
The strategic picture
Meta is commoditizing AI to protect its own platforms from Google and OpenAI tax. The winners are everyone else — developers now have a free, near-frontier model to build on.
Llama 4 isn't going to dethrone ChatGPT for consumers. But for businesses and developers, it changes everything.
Related Stories
View all in Artificial Intelligence →
GPT-5 Is Here: Everything You Need to Know About OpenAI's Most Powerful Model Yet
OpenAI just unveiled GPT-5 with breakthrough reasoning, vision, and agentic capabilities. Here's how it changes the AI landscape forever.

Will AI Coding Agents Replace Developers? We Asked 100 Engineers
Devin, Cursor, GitHub Copilot Workspace, and Claude Code are reshaping software engineering. Here's what's actually happening on the ground.

The 27 Best AI Tools in 2026 (Tested for 90 Days)
We spent 90 days testing every major AI tool released in 2026. Here are the 27 winners across writing, coding, image, video, voice, and productivity — and the ones to skip.

ChatGPT vs Claude 4: Which AI Should You Actually Pay For in 2026?
We ran 50 head-to-head prompts on ChatGPT (GPT-5) and Claude 4 Opus across coding, writing, math, and reasoning. Here's the honest verdict.

Google Gemini 3 Ultra Review: Has Google Finally Caught Up?
Google's Gemini 3 Ultra promises GPT-5-level performance with native 2M context. After two weeks of daily use, here's what's real and what's hype.

Midjourney vs DALL-E 4 vs Flux 1.1: The Definitive AI Image Generator Comparison
We generated the same 30 prompts across Midjourney v7, DALL-E 4, Flux 1.1 Pro, and Stable Diffusion 3.5. The results surprised us.
The Daily Pulse
Get the 5 biggest tech stories in your inbox every morning. Free, no spam, unsubscribe anytime.
Join 50,000+ tech professionals reading every day.