New Agent Execution Environment
OpenAI has expanded the Responses API to include a shell tool and hosted container workspace, enabling developers to build agents that execute complex workflows. Unlike previous approaches that limited models to pure inference, agents can now propose and execute shell commands in an isolated compute environment, handling tasks like running services, querying APIs, and generating outputs like spreadsheets or reports.
How It Works
The shell tool allows models to interact with a computer through familiar Unix utilities (grep, curl, awk, etc.). The Responses API orchestrates a tight execution loop: the model proposes actions (like reading files or fetching data), the platform executes them in the container, and results feed back to the model for the next step. Models GPT-5.2 and later are trained to propose shell commands natively.
Key features include:
- Concurrent execution: Models can propose multiple shell commands that execute in parallel, with the Responses API multiplexing streams back into structured context
- Bounded output: Large command outputs are capped with preserved beginning and end text to manage context window usage
- Isolated environment: Commands run in a restricted container with a filesystem, optional structured storage (SQLite), and controlled network access
- Streaming results: Shell output flows back to the model in near real-time for dynamic decision-making
Context Management for Long-Running Tasks
The service includes context compaction to handle long-running agent workflows that would otherwise fill the limited context window. Rather than requiring developers to build custom summarization logic, the platform automatically manages context across multiple turns while preserving key details.
Practical Benefits
This removes infrastructure burden from developers: no more building custom execution environments, managing file storage between steps, pasting large tables into prompts, or handling timeouts and retries manually. The platform provides a complete agent execution harness within the Responses API.