OpenAI introduces Lockdown Mode and Elevated Risk labels to mitigate prompt injection attacks

New Security Protections Against Prompt Injection Attacks

As AI systems take on increasingly complex tasks—particularly those involving web access and connected applications—the security landscape has shifted. One emerging threat has become especially critical: prompt injection attacks, where malicious third parties attempt to manipulate AI systems into following unauthorized instructions or exposing sensitive information.

OpenAI is introducing two complementary protections designed to help users and organizations mitigate these risks:

Lockdown Mode: Advanced Protection for High-Risk Users

Lockdown Mode is an optional, advanced security setting designed for high-security users such as executives and security teams at prominent organizations. This mode tightly constrains how ChatGPT can interact with external systems to dramatically reduce prompt injection-based data exfiltration risks.

Key features of Lockdown Mode include:

Restricted web browsing: Limited to cached content only—no live network requests leave OpenAI's controlled infrastructure, preventing sensitive data exfiltration through browsing attacks
Deterministic feature disablement: Tools and capabilities that could be exploited by attackers are disabled entirely when strong data safety guarantees cannot be provided
Admin-controlled granularity: Workspace admins can selectively enable or disable specific apps and actions within those apps for users in Lockdown Mode
Availability: Currently available for ChatGPT Enterprise, ChatGPT Edu, ChatGPT for Healthcare, and ChatGPT for Teachers; consumer availability is planned for the coming months

Admins can enable Lockdown Mode through Workspace Settings by creating a new role with these restrictions layered on top of existing security controls.

Elevated Risk Labels: Transparent Risk Communication

To help users make informed decisions about feature usage, OpenAI is standardizing how it labels capabilities that introduce additional security risks. These "Elevated Risk" labels will appear consistently across ChatGPT, ChatGPT Atlas, and Codex, ensuring users receive the same clear guidance regardless of where they encounter these features.

The labels include explanations of what changes, what risks may be introduced, and when that access is appropriate. For example, in Codex (OpenAI's coding assistant), developers can grant network access for tasks like looking up documentation, with the "Elevated Risk" label clearly displayed alongside configuration options.

Broader Security Strategy

These protections build on OpenAI's existing multi-layered approach, including sandboxing, protections against URL-based data exfiltration, monitoring and enforcement, and enterprise-grade controls like role-based access and audit logs. The Compliance API Logs Platform provides detailed visibility into app usage, shared data, and connected sources to maintain admin oversight.

As OpenAI strengthens safeguards for these features, the "Elevated Risk" label will be removed once security advances sufficiently mitigate risks for general use.

New Security Protections Against Prompt Injection Attacks

Lockdown Mode: Advanced Protection for High-Risk Users

Elevated Risk Labels: Transparent Risk Communication

Broader Security Strategy

Products

Tags

Published

Source

Related News