Claude API expands 1M token context to GA; introduces automatic prompt caching and data residency controls

Extended Context Now Generally Available

The 1M token context window has graduated from beta to general availability for Claude Opus 4.6 and Sonnet 4.6. Requests over 200k tokens now work automatically at standard pricing with no beta header required. The feature remains in beta for earlier model versions (Sonnet 4.5 and Sonnet 4). Anthropic has also removed dedicated 1M rate limits, consolidating all requests under standard account limits regardless of context length.

Automatic Prompt Caching Streamlines Conversations

Anthropic has launched automatic caching for the Messages API, eliminating the need for manual cache breakpoint management. Developers can now add a single cache_control field to requests, and the system automatically caches the last cacheable block, moving the cache point forward as conversations grow. This feature is available on the Claude API and Azure AI Foundry (preview) and works alongside existing block-level cache control for fine-grained optimization.

New Platform Capabilities and Pricing Changes

Several tools have transitioned from beta to general availability, including web search, web fetch, code execution, and fine-grained tool streaming. Notably, code execution is now free when used with web search or web fetch, reducing costs for agentic workflows. The web search and web fetch tools now support dynamic filtering, using code execution to filter results before they reach the context window for better token efficiency.

Anthropic introduced data residency controls via the inference_geo parameter, allowing US-only inference at 1.1x pricing for models released after February 1, 2026. A new compaction API (beta) provides server-side context summarization for effectively infinite conversations on Opus 4.6.

Model Updates and Deprecations

Claude Sonnet 3.7 (claude-3-7-sonnet-20250219) and Claude Haiku 3.5 (claude-3-5-haiku-20241022) have been retired and now return errors. Developers should upgrade to Sonnet 4.6 and Haiku 4.5 respectively. Claude Haiku 3 (claude-3-haiku-20240307) has been deprecated with retirement scheduled for April 19, 2026.

The media limit for 1M token context requests has been raised from 100 to 600 images or PDF pages per request. Opus 4.6 introduces adaptive thinking as the recommended approach for controlling reasoning depth, deprecating manual thinking mode with budget_tokens.

Developer Action Items

Migrate from retired models (Sonnet 3.7, Haiku 3.5) to their current generation equivalents
Update code to use output_config.format instead of output_format for structured outputs
Take advantage of automatic caching by adding cache_control fields to existing requests
Consider using free code execution with web search to optimize agentic workflow costs
For sensitive workloads, explore data residency controls with inference_geo parameter

Extended Context Now Generally Available

Automatic Prompt Caching Streamlines Conversations

New Platform Capabilities and Pricing Changes

Model Updates and Deprecations

Developer Action Items

Products

Tags

Published

Source

Related News