What's New
LangChain has released an autonomous context compression tool for the Deep Agents SDK (Python) and CLI. This feature allows AI agents to automatically decide when to compress their context windows—replacing older messages with summaries—rather than relying on fixed token thresholds or manual user commands.
The Problem
Traditional agent harnesses use static, hand-tuned rules to compress context at fixed thresholds (Deep Agents compacts at 85% of a model's context limit). This approach is suboptimal because:
- Timing matters: Compressing mid-refactor disrupts workflows, while compressing at task boundaries is cleaner and more effective
- User awareness: Users must manually trigger compression with commands like
/compact, adding cognitive overhead - Rigid thresholds: Fixed token limits don't account for when context is actually becoming irrelevant
When Agents Trigger Compression
The tool guides agents to compress at opportune moments:
- Task boundaries: Starting a new task where prior context is irrelevant
- After extraction: Completing research or summarization tasks that consumed significant context
- Before large inputs: Preparing to read or generate substantial new content
- Before complex processes: Starting refactors, migrations, or multi-step executions
- After decisions: When new requirements invalidate prior context or resolve tangents
How It Works
The feature is implemented as middleware that preserves 10% of available context (recent messages) and summarizes everything before it. Deep Agents retains full conversation history in its virtual filesystem, allowing context recovery post-summarization. Testing shows agents are conservative about triggering compression but choose strategically beneficial moments when they do.
Getting Started
In the SDK:
from deepagents import create_deep_agent
from deepagents.middleware.summarization import create_summarization_tool_middleware
agent = create_deep_agent(
model="openai:gpt-5.4",
middleware=[create_summarization_tool_middleware(model, backend)],
)
In the CLI: Simply call /compact when ready to trim context or move to a new task.
This release reflects LangChain's broader philosophy of giving models more control over their working memory rather than relying on rigid, hand-tuned harness rules.