Extended Context Now Generally Available
The 1M token context window has graduated from beta to general availability for Claude Opus 4.6 and Sonnet 4.6. Requests over 200k tokens now work automatically at standard pricing with no beta header required. The feature remains in beta for earlier model versions (Sonnet 4.5 and Sonnet 4). Anthropic has also removed dedicated 1M rate limits, consolidating all requests under standard account limits regardless of context length.
Automatic Prompt Caching Streamlines Conversations
Anthropic has launched automatic caching for the Messages API, eliminating the need for manual cache breakpoint management. Developers can now add a single cache_control field to requests, and the system automatically caches the last cacheable block, moving the cache point forward as conversations grow. This feature is available on the Claude API and Azure AI Foundry (preview) and works alongside existing block-level cache control for fine-grained optimization.
New Platform Capabilities and Pricing Changes
Several tools have transitioned from beta to general availability, including web search, web fetch, code execution, and fine-grained tool streaming. Notably, code execution is now free when used with web search or web fetch, reducing costs for agentic workflows. The web search and web fetch tools now support dynamic filtering, using code execution to filter results before they reach the context window for better token efficiency.
Anthropic introduced data residency controls via the inference_geo parameter, allowing US-only inference at 1.1x pricing for models released after February 1, 2026. A new compaction API (beta) provides server-side context summarization for effectively infinite conversations on Opus 4.6.
Model Updates and Deprecations
Claude Sonnet 3.7 (claude-3-7-sonnet-20250219) and Claude Haiku 3.5 (claude-3-5-haiku-20241022) have been retired and now return errors. Developers should upgrade to Sonnet 4.6 and Haiku 4.5 respectively. Claude Haiku 3 (claude-3-haiku-20240307) has been deprecated with retirement scheduled for April 19, 2026.
The media limit for 1M token context requests has been raised from 100 to 600 images or PDF pages per request. Opus 4.6 introduces adaptive thinking as the recommended approach for controlling reasoning depth, deprecating manual thinking mode with budget_tokens.
Developer Action Items
- Migrate from retired models (Sonnet 3.7, Haiku 3.5) to their current generation equivalents
- Update code to use
output_config.formatinstead ofoutput_formatfor structured outputs - Take advantage of automatic caching by adding
cache_controlfields to existing requests - Consider using free code execution with web search to optimize agentic workflow costs
- For sensitive workloads, explore data residency controls with
inference_geoparameter