Claude Sonnet 5: Anthropic's Agentic Leap at Half the Opus Price

Claude Sonnet 5 launch by Anthropic — June 30, 2026
Claude Sonnet 5 launched June 30, 2026. Source: Anthropic

Every few months, an AI release changes what you can realistically build at a given budget. Claude Sonnet 5, launched by Anthropic on June 30, 2026, is one of those releases. For the first time, Sonnet-tier pricing buys you Opus-tier agentic depth: a model that does not just answer questions but plans tasks, runs tools, checks its own work, and finishes complex workflows without needing you to hold its hand at every step. For a startup in Bangalore, a product team in Lagos, or an engineering firm in São Paulo running AI on tight margins, that changes the calculation entirely.

Jun 30 Launch date, 2026
$2 Per million input tokens (intro, until Aug 31)
0% Exploit success on Firefox 147 (safety by design)

The Gap It Closes

Sonnet 4.6 was a capable model for single-turn tasks: drafting, answering, summarizing, writing code for a function you described clearly. Where it struggled was in the kind of work that defines modern AI pipelines: tasks that require planning, using tools, reading results, and adapting. Ask it to investigate a bug, write a fix, test it, and verify the result, and it would often stall partway through, asking for clarification or producing output that needed significant human review before the next step.

Sonnet 5 was built specifically to close that gap. The model plans before it acts. It uses tools natively, including browsers and terminals, without requiring elaborate prompting to trigger that behavior. And it checks its own output without being told to: a step called self-verification that reduces the number of review rounds in agentic pipelines in practice, not just on paper.

Sonnet-tier cost. Opus-tier task completion. That combination changes what teams with real budget constraints can actually build.

What Is Actually New

The changes are not cosmetic. Here is what Sonnet 5 does differently from Sonnet 4.6, based on Anthropic's published capabilities and early tester feedback:

Early testers reported real pull requests carried through to a tested, verified result autonomously, bug investigations where Sonnet 5 wrote tests, implemented fixes, and verified them in a single pass, and CRM workflows completed end-to-end including data updates and outbound communications in one run.

Benchmarks: Where It Stands

Anthropic published results on two evaluations designed for agentic capability. Static knowledge tests like MMLU are less relevant when the question is "can this model do my work." These two are closer to that question.

On BrowseComp, which tests agentic web research across multiple pages, Sonnet 5 offers a wider cost-to-performance range than Sonnet 4.6. You can get more done per dollar. On OSWorld-Verified, which tests the model's ability to interact with real desktop interfaces including applications and operating systems, Sonnet 5 matches Opus 4.8 scores at specific effort settings. That is the headline number: Sonnet 5 at certain configurations performs like Opus 4.8 on computer use, at a fraction of the price.

BrowseComp scatter chart showing Claude Sonnet 5 agentic search performance by effort level versus Opus 4.8 and Sonnet 4.6, plotted by cost per task and pass rate
BrowseComp: Sonnet 5 delivers higher pass rates at lower cost per task compared to Sonnet 4.6. Source: Anthropic
Benchmark comparison table: Claude Sonnet 5 scores 63.2% on SWE-bench Pro, 80.4% on Terminal-Bench 2.1, 57.4% on Humanity's Last Exam with tools, 81.2% on OSWorld-Verified, versus Sonnet 4.6 and Opus 4.8
Full benchmark results: Sonnet 5 vs Sonnet 4.6 vs Opus 4.8 across coding, reasoning, computer use, and knowledge work. Source: Anthropic
Model OSWorld-Verified Agentic Depth Cost (Input / Output per MTok)
Claude Sonnet 4.6 78.5% Limited multi-step Lower than Sonnet 5
Claude Sonnet 5 Matches Opus 4.8 at some settings Full agentic loop $2 / $10 (intro) · $3 / $15 (standard)
Claude Opus 4.8 Highest Strongest Significantly higher

Building an agentic product on Claude and want help picking the right model tier for each part of your pipeline? Naraway designs LLM architectures for startups and product teams.

See Our AI Services

Pricing: The Introductory Window

Anthropic is offering introductory pricing through August 31, 2026: $2 per million input tokens and $10 per million output tokens. After that, standard pricing applies at $3 input and $15 output. The introductory window is not just a discount: it is a signal to migrate and build now, before your cost models change. Teams already on Sonnet 4.6 who upgrade before September lock in a month of lower-cost exploration at significantly higher capability.

Period Input per million tokens Output per million tokens
Until August 31, 2026 $2 $10
September 1, 2026 onwards $3 $15

The model is the default on both Free and Pro plans on Claude.ai. API access uses the model identifier claude-sonnet-5. Enterprise, Team, Max, and Claude Code plans all have access.

What This Means Across Different Use Cases

The same underlying capability plays out differently depending on what you are building. Here is how Sonnet 5 changes the picture across a few common contexts:

For Software Teams

The self-verification loop and multi-step task completion are the two changes that matter most. An engineer in a startup running lean can now delegate entire debugging sessions to the model: write the test, find the root cause, implement the fix, run verification. The output does not just need review for correctness; it comes back already reviewed by the model. That removes one or two human-in-the-loop steps per task, compounding over a week of work.

For Operations and Business Teams

Teams automating CRM workflows, document processing, or multi-step client communications can now run those pipelines with far less scaffolding. Sonnet 5 does not stop when it hits a decision point that requires reading something: it reads it, decides, and continues. For operations teams in industries like insurance, legal services, or logistics, that end-to-end execution was previously only possible with Opus 4.8 and its associated cost.

For Founders Building AI Products

The cost-to-capability shift is the biggest unlock. If you were previously building on Sonnet 4.6 because Opus 4.8 was too expensive to serve at scale, Sonnet 5 moves the ceiling significantly. You get agentic depth at a price that scales with your product, not against it.

If you are a founder or product team evaluating Claude Sonnet 5 for production, Naraway can help you architect the right system from the start.

Talk to Us on WhatsApp

Safety: Designed In, Not Added On

Anthropic's safety approach with Sonnet 5 deserves a clear reading. The model has a lower rate of undesirable behaviors than Sonnet 4.6, improves on refusing malicious requests, and is harder to manipulate through prompt injection from external content in the environment. On Firefox 147 exploit development, it achieved 0% success. That number is intentional: Anthropic designed Sonnet 5 to have substantially reduced cybersecurity offensive capability compared to Opus 4.8.

Firefox 147 exploit development bar chart comparing Claude Sonnet 4.6 at 0%, Opus 4.8 at 8.8%, Mythos 5 at 88.4%, and Sonnet 5 at 0% for working exploits — showing Sonnet 5's deliberate cybersecurity restrictions
Firefox 147 exploit development: Sonnet 5 achieves 0% on working exploits — the same as Sonnet 4.6, and far below Mythos 5. Safety by design. Source: Anthropic
Misaligned behavior score chart: Sonnet 4.6 scores 2.89, Mythos Preview 1.95, Opus 4.8 2.10, Sonnet 5 2.53 — lower scores indicate fewer misaligned responses
Misaligned behavior scores (lower is better): Sonnet 5 at 2.53 is a clear improvement over Sonnet 4.6 at 2.89. Source: Anthropic

For teams deploying agentic systems in professional environments: finance, healthcare, legal, compliance, this is a feature, not a limitation. A model that completes complex multi-step tasks but is deliberately constrained from being weaponized is exactly what enterprise deployment requires. The safety profile is documented, testable, and consistent across API access.

The Tokenizer Change: Do Not Skip This

Sonnet 5 ships with a new tokenizer. The same content that worked within Sonnet 4.6's context window will require between 1.0 and 1.35 times more tokens in Sonnet 5. This is not a large number in isolation, but it matters in two places: prompts running near context limits, and cost projections for high-volume API usage.

Before migrating production applications from Sonnet 4.6, test your longest prompts against the new tokenizer. Recalculate your cost estimates using the new per-token pricing, and factor in the 15 to 35 percent token increase. Do this before August 31 so your numbers are accurate before the introductory pricing window closes.

Upgrade Checklist

Need Help Migrating to Claude Sonnet 5?

Naraway builds and migrates AI-powered systems for product teams and enterprises. If you are moving from Sonnet 4.6 or evaluating Claude for the first time, we can scope the architecture, handle the integration, and ensure the tokenizer and cost changes do not catch you off guard.

Explore AI Integration