← Back
OpenAI
OpenAI Equips Responses API with shell tool and container workspace for agentic workflows
OpenAI API · featureapiplatformrelease · openai.com ↗

Shell Tool for Broader Task Execution

OpenAI has expanded the Responses API with a shell tool that allows models to propose and execute shell commands in an isolated environment. Unlike the existing code interpreter limited to Python, the shell tool supports a wider range of use cases including running Go or Java programs, executing Unix utilities like grep and curl, and even starting servers. The model proposes commands, the platform executes them in a containerized environment, and results feed back to the model for continued reasoning.

Agent Loop Orchestration

The Responses API now orchestrates the complete agent loop automatically. When processing requests, the API:

  • Assembles model context including user prompts and tool instructions
  • Receives shell command proposals from GPT-5.2 and later models
  • Executes commands in a container runtime
  • Streams output back to the model in near real-time
  • Repeats until the model provides a final answer without additional commands

The system supports concurrent execution of multiple commands in parallel, with independent streaming and result multiplexing for efficiency.

Production-Ready Infrastructure

The hosted container workspace provides:

  • Filesystem isolation: Temporary storage for intermediate files and outputs
  • Structured storage: Optional SQLite for data persistence
  • Restricted network access: Controlled API connectivity without security compromises
  • Output bounding: Configurable limits on command output to prevent context window overflow, preserving both beginning and end of results
  • Context compaction: Automatic management of long-running tasks to preserve key details while removing extraneous information

Developer Impact

Developers no longer need to build custom execution environments, workflow orchestration, or state management systems. The platform handles infrastructure concerns while models focus on proposing logical steps and commands for accomplishing complex, multi-step tasks.