Back to list
Apr 02, 2026
91
0
0
GPTNEW

OpenAI Upgrades Responses API With Shell Tool and Agent Skills

OpenAI extends the Responses API with a full terminal shell, reusable agent skills via SKILL.md, and server-side compaction for multi-hour autonomous agents.

#OpenAI#Responses API#Shell Tool#Agent Skills#SKILL.md
OpenAI Upgrades Responses API With Shell Tool and Agent Skills
AI Summary

OpenAI extends the Responses API with a full terminal shell, reusable agent skills via SKILL.md, and server-side compaction for multi-hour autonomous agents.

From Model API to Agent Platform

OpenAI has announced a significant expansion of the Responses API, transforming it from a model inference endpoint into a full-featured agent development platform. The update introduces three key capabilities: a shell tool for terminal command execution, a standardized agent skills system, and server-side context compaction that enables agents to run continuously for hours or days. Together, these features represent OpenAI's most substantial move toward making autonomous AI agents a practical reality for developers.

The announcement, covered by VentureBeat and InfoQ, positions the Responses API as a direct competitor to Anthropic's Claude computer-use capabilities and other emerging agent frameworks. Notably, both OpenAI and Anthropic have converged on the same open standard for agent skills, signaling an unusual moment of interoperability in the competitive AI landscape.

Shell Tool: Beyond Python to Full Terminal Access

The shell tool is the most immediately impactful addition. Unlike OpenAI's existing code interpreter, which only executes Python, the shell tool provides access to a complete terminal environment. Developers can choose between two modes.

Hosted shells run on OpenAI-managed infrastructure using a Debian 12 container with preinstalled runtimes for Python 3.11, Node.js 22, Java 17, PHP 8.2, Ruby 3.1, and Go 1.23. Files persist in a /mnt/data working directory, and containers remain active for 20 minutes of inactivity before expiring. Developers can provision containers automatically with container_auto for single-request tasks, or create persistent containers that maintain state across multiple API requests.

Local shells allow developers to run the execution environment on their own infrastructure, providing full control over the runtime context. This option is designed for organizations with zero-data-retention (ZDR) compliance requirements or those needing access to internal systems.

Network access is disabled by default in hosted containers, addressing security concerns. Organizations can configure domain allowlists through their dashboard, and request-level policies can further restrict access. A domain_secrets feature safely injects authorization headers without exposing raw credentials to the model, using placeholder substitution during execution.

Agent Skills: A Standardized Package Format

The skills system introduces a reusable packaging format for agent capabilities. Each skill is a versioned folder bundle anchored by a SKILL.md manifest file containing YAML frontmatter and natural-language instructions, along with supporting resources such as API specifications, code scripts, and UI assets.

When attached to a container, skills are copied into the execution environment where the model can read instructions and run associated code. This creates a modular architecture where complex agent behaviors can be composed from pre-built skill packages rather than constructed from scratch in every session.

The most significant aspect of the skills announcement is the convergence with Anthropic. Both companies now support the same SKILL.md format, meaning skills developed for one platform can be used on the other. This interoperability is rare in the AI industry and could accelerate the development of a shared ecosystem for agent tooling.

Server-Side Compaction: The Key to Long-Running Agents

Context compaction addresses the most fundamental limitation of current AI agents: token windows. When agents run complex multi-step tasks, they rapidly consume their context window with accumulated tool outputs, intermediate results, and conversation history. Previously, this forced developers to implement manual truncation strategies that often lost critical information.

Server-side compaction compresses previous steps into shorter representations while preserving essential information. Unlike simple truncation, compaction uses the model itself to identify and retain the most important context from earlier in the session.

Early results are promising. E-commerce platform Triple Whale reported that their agent, Moby, successfully navigated a session involving 5 million tokens and 150 tool calls without any drop in accuracy. This scale of continuous operation was previously impractical with standard context window management.

The combination of compaction with persistent containers means agents can now maintain state and context across sessions that span hours or days, a requirement for real-world automation tasks like code reviews, data pipeline management, and multi-step research workflows.

Agent Execution Loop

Underlying all three features is a new agent execution loop built into the Responses API. Rather than producing immediate answers, the model proposes actions (running shell commands, querying data, fetching web content), which execute in a controlled environment. Results feed back iteratively until the task is complete.

This loop transforms the Responses API from a request-response interface into an agent runtime. Developers define the tools, skills, and environment; the API handles the orchestration of multi-step execution, including error recovery and context management.

Security Considerations

OpenAI's documentation explicitly acknowledges the security risks introduced by these capabilities. Enabling network access in containers introduces "meaningful security and data-governance risk," according to the official documentation. Prompt injection through externally-fetched content is identified as a particular concern when network access is enabled.

The recommended mitigations include limiting domain allowlists to trusted destinations, implementing command review workflows for sensitive operations, validating third-party data retention policies, and auditing session logs regularly. The local shell option provides an additional security layer for organizations that require full control over their execution environment.

Competitive Positioning

The Responses API upgrade positions OpenAI directly against Anthropic's computer-use capabilities, Google's Gemini agent features, and a growing ecosystem of open-source agent frameworks. The convergence on SKILL.md with Anthropic is strategically interesting, as it suggests both companies see more value in a shared standard than in proprietary lock-in at the skill layer.

For developers, the practical implication is that building agent capabilities is becoming a platform-level feature rather than a custom engineering challenge. The shell tool eliminates the need for external code execution infrastructure, skills reduce boilerplate, and compaction removes the primary scaling limitation.

Conclusion

The Responses API update represents OpenAI's transition from selling model access to selling agent infrastructure. The shell tool, skills system, and compaction capabilities collectively lower the barrier to building agents that can perform sustained, complex work in real computing environments. For developers already building on the OpenAI platform, these features provide immediate productivity gains. For the broader AI industry, the convergence on shared standards like SKILL.md signals that the agent platform layer is maturing faster than many expected.

Pros

  • Six preinstalled language runtimes eliminate the need for external code execution infrastructure
  • Interoperable SKILL.md standard prevents vendor lock-in and enables cross-platform agent skill sharing
  • Server-side compaction solves the context window limitation that has constrained long-running agents
  • Local shell option provides full data sovereignty for security-conscious organizations
  • Built-in credential injection via domain_secrets reduces secret exposure risk in agent workflows

Cons

  • 20-minute container timeout limits use cases requiring persistent background processes
  • Hosted containers lack sudo access, restricting certain system-level operations
  • Network access introduces prompt injection risks that require careful domain allowlist management
  • Pricing for container sessions adds cost beyond standard API token usage

Comments0

Key Features

1. Shell tool provides full terminal access with Debian 12 containers running Python, Node.js, Java, PHP, Ruby, and Go 2. SKILL.md agent skills standard, interoperable with Anthropic's Claude, enables reusable packaged agent capabilities 3. Server-side compaction allows continuous agent sessions handling 5M+ tokens and 150+ tool calls without accuracy loss 4. Hosted containers with 20-minute persistence, domain allowlists, and credential injection via domain_secrets 5. Agent execution loop transforms the API from request-response into an iterative autonomous runtime

Key Insights

  • The shell tool expands OpenAI's execution capabilities from Python-only to six languages, enabling a far wider range of agent use cases
  • Convergence on SKILL.md between OpenAI and Anthropic is a rare moment of interoperability that could accelerate shared agent tooling ecosystems
  • Triple Whale's Moby agent navigating 5 million tokens across 150 tool calls demonstrates compaction's practical viability for enterprise workloads
  • Local shell option addresses ZDR compliance, making the agent platform viable for regulated industries including finance and healthcare
  • The transition from selling model access to selling agent infrastructure marks a fundamental shift in OpenAI's business model
  • Default-off network access and domain allowlists reflect lessons learned from computer-use security incidents across the industry
  • Persistent containers with state management across sessions enable multi-day automation workflows previously impossible with stateless APIs

Was this review helpful?

Share

Twitter/X