← Back
Anthropic
Anthropic launches Claude Sonnet 4.6 with automatic prompt caching and free code execution
ClaudeAnthropic · releasefeatureapideprecationmodelpricingperformance · platform.claude.com ↗

Claude Sonnet 4.6 Launch

Anthropic announced Claude Sonnet 4.6 as its latest balanced model, designed to deliver improved agentic search performance while consuming fewer tokens than previous versions. Sonnet 4.6 supports extended thinking capabilities and a 1M token context window (currently in beta), matching the context capabilities of the flagship Opus model while maintaining faster inference speeds.

Automatic Prompt Caching

The Messages API now features automatic caching, eliminating the need for manual breakpoint management. Developers can add a single cache_control field to their request body, and the system automatically caches the last cacheable block, moving the cache point forward as conversations grow. This works alongside existing block-level cache control for fine-grained optimization and is available on the Claude API and Azure AI Foundry (preview).

Pricing & Tool Updates

Key pricing and feature changes include:

  • Free code execution when used with web search or web fetch tools
  • Web search tool and programmatic tool calling now generally available (exiting beta)
  • Dynamic filtering for web search and fetch, using code execution to filter results before reaching the context window
  • Fine-grained tool streaming now generally available across all models and platforms
  • Multiple tools graduating from beta: code execution tool, web fetch tool, tool search tool, tool use examples, and memory tool

Model Deprecations

Anthropic retired Claude Sonnet 3.7 and Claude Haiku 3.5, with all requests returning errors. Developers should migrate to Claude Sonnet 4.6 and Claude Haiku 4.5 respectively. Claude Haiku 3 deprecation was also announced, with retirement scheduled for April 19, 2026. Researchers can request ongoing access through the External Researcher Access Program.

Additional Capabilities

  • Opus 4.6 fast mode (research preview) delivers output generation up to 2.5x faster via the speed parameter
  • Compaction API (beta) provides server-side context summarization for effectively infinite conversations on Opus 4.6
  • Data residency controls allow developers to specify inference location via the inference_geo parameter, with US-only inference available at 1.1x pricing for models released after February 1, 2026
  • Structured outputs are now generally available on Claude Sonnet 4.5, Opus 4.5, and Haiku 4.5 with expanded schema support and simplified integration