Claude API adds model capability discovery, enables 1M token context for Opus and Sonnet at standard pricing

Model Capability Discovery

The Models API now returns detailed capability metadata via GET /v1/models and GET /v1/models/{model_id} endpoints. Developers can query max_input_tokens, max_tokens, and a capabilities object to programmatically discover which features each model supports.

Extended Thinking and Display Controls

Anthropic introduced the display field for extended thinking, allowing developers to omit thinking content from responses for faster streaming. Setting thinking.display: "omitted" returns thinking blocks with empty content while preserving the signature field for multi-turn continuity, with no changes to billing.

1M Token Context General Availability

The 1M token context window is now generally available (no beta header required) for Claude Opus 4.6 and Sonnet 4.6 at standard pricing. Requests exceeding 200k tokens work automatically. Dedicated 1M rate limits have been removed in favor of standard account limits across all context lengths. The media limit has been raised from 100 to 600 images or PDF pages per request.

Automatic Prompt Caching

Automatic caching now works with the Messages API through a single cache_control field in the request body. The system automatically caches the last cacheable block and moves the cache point forward as conversations grow, eliminating manual breakpoint management. This works alongside existing block-level cache control for fine-grained optimization.

New Model Releases and Deprecations

Claude Sonnet 4.6 launched as a balanced model combining speed and intelligence, with improved agentic search performance and reduced token consumption. It supports extended thinking and the 1M token context window.

Retired models:

Claude Sonnet 3.7 (claude-3-7-sonnet-20250219)
Claude Haiku 3.5 (claude-3-5-haiku-20241022)

Claude Haiku 3 (claude-3-haiku-20240307) is deprecated with retirement scheduled for April 19, 2026.

Tool Updates

API code execution is now free when used with web search or web fetch
Web search tool and programmatic tool calling are generally available (no beta header required)
Web search and web fetch support dynamic filtering using code execution to reduce token costs
Code execution tool, web fetch tool, tool search tool, tool use examples, and memory tool are all now generally available

Additional Features

Fast mode (research preview) for Opus 4.6 provides up to 2.5x faster output token generation via the speed parameter at premium pricing
Effort parameter is now generally available with support for Claude Opus 4.6, replacing budget_tokens for controlling thinking depth
Compaction API (beta) provides server-side context summarization for effectively infinite conversations on Opus 4.6
Data residency controls allow specifying inference geo with the inference_geo parameter; US-only inference available at 1.1x pricing
Fine-grained tool streaming is generally available on all models; structured outputs parameter moved to output_config.format

Model Capability Discovery

Extended Thinking and Display Controls

1M Token Context General Availability

Automatic Prompt Caching

New Model Releases and Deprecations

Tool Updates

Additional Features

Products

Tags

Published

Source

Related News