Teen Safety Policies for Open-Weight Models
OpenAI is releasing a collection of prompt-based safety policies to help developers operationalize teen-specific protections in AI systems. These policies are designed to work with OpenAI's open-weight safety model, gpt-oss-safeguard, and can be directly integrated into content filtering and moderation workflows.
The initial policy release covers six critical risk areas:
- Graphic violent content
- Graphic sexual content
- Harmful body ideals and behaviors
- Dangerous activities and challenges
- Romantic or violent roleplay
- Age-restricted goods and services
Addressing Developer Challenges
One of the biggest obstacles developers face is translating high-level safety requirements into precise, operational rules. Even experienced teams struggle to define policies that accurately capture teen-specific risks while avoiding inconsistent enforcement or overly broad filtering. These prompt-based policies address that gap by providing clear, tested foundations that developers can adapt to their specific use cases.
Developed with External Expertise
The policies were developed in collaboration with organizations including Common Sense Media and everyone.ai, incorporating research on teens' developmental differences and unique vulnerabilities. This external input helped shape the scope of coverage and strengthen the policy structure.
Open Source and Iterative
Released as open source through the ROOST Model Community on GitHub, these policies are positioned as a starting point rather than a comprehensive solution. Developers are encouraged to adapt, extend, and contribute improvements based on their specific product contexts and user needs. OpenAI emphasizes that these policies should be combined with additional safeguards including product design choices, user controls, and transparent communications.
Part of Broader Youth Safety Efforts
This release builds on OpenAI's existing teen protection work, including updates to the Model Spec with Under-18 principles, introduction of parental controls in ChatGPT, age prediction features, and the Teen Safety Blueprint. The policies represent a commitment to democratizing safety tools across the open-weights ecosystem.