Model Capability Discovery
The Models API now returns detailed capability metadata via GET /v1/models and GET /v1/models/{model_id} endpoints. Developers can query max_input_tokens, max_tokens, and a capabilities object to programmatically discover which features each model supports.
Extended Thinking and Display Controls
Anthropic introduced the display field for extended thinking, allowing developers to omit thinking content from responses for faster streaming. Setting thinking.display: "omitted" returns thinking blocks with empty content while preserving the signature field for multi-turn continuity, with no changes to billing.
1M Token Context General Availability
The 1M token context window is now generally available (no beta header required) for Claude Opus 4.6 and Sonnet 4.6 at standard pricing. Requests exceeding 200k tokens work automatically. Dedicated 1M rate limits have been removed in favor of standard account limits across all context lengths. The media limit has been raised from 100 to 600 images or PDF pages per request.
Automatic Prompt Caching
Automatic caching now works with the Messages API through a single cache_control field in the request body. The system automatically caches the last cacheable block and moves the cache point forward as conversations grow, eliminating manual breakpoint management. This works alongside existing block-level cache control for fine-grained optimization.
New Model Releases and Deprecations
Claude Sonnet 4.6 launched as a balanced model combining speed and intelligence, with improved agentic search performance and reduced token consumption. It supports extended thinking and the 1M token context window.
Retired models:
- Claude Sonnet 3.7 (
claude-3-7-sonnet-20250219) - Claude Haiku 3.5 (
claude-3-5-haiku-20241022)
Claude Haiku 3 (claude-3-haiku-20240307) is deprecated with retirement scheduled for April 19, 2026.
Tool Updates
- API code execution is now free when used with web search or web fetch
- Web search tool and programmatic tool calling are generally available (no beta header required)
- Web search and web fetch support dynamic filtering using code execution to reduce token costs
- Code execution tool, web fetch tool, tool search tool, tool use examples, and memory tool are all now generally available
Additional Features
- Fast mode (research preview) for Opus 4.6 provides up to 2.5x faster output token generation via the
speedparameter at premium pricing - Effort parameter is now generally available with support for Claude Opus 4.6, replacing
budget_tokensfor controlling thinking depth - Compaction API (beta) provides server-side context summarization for effectively infinite conversations on Opus 4.6
- Data residency controls allow specifying inference geo with the
inference_geoparameter; US-only inference available at 1.1x pricing - Fine-grained tool streaming is generally available on all models; structured outputs parameter moved to
output_config.format