Rethinking Access Control for High-Demand Products
As Codex and Sora adoption accelerated over the past year, OpenAI encountered a common scaling challenge: traditional access models—strict rate limits or immediate usage-based billing—fell short. Rate limits left users with a frustrating "come back later" experience, while pure usage-based billing introduced lag, reconciliation issues, and required charging from the first token without supporting early exploration.
The Hybrid Access Model
OpenAI built a unified system that combines both approaches. Users operate within a rate-limit waterfall that flows through multiple layers: enforced rate limits, free tiers, promotional access, enterprise entitlements, and finally credits. When a user exceeds their rate limit, the system seamlessly transitions to credit consumption in the same request, making the credit system feel invisible to end users.
The key architectural shift was modeling access as a decision waterfall rather than a binary gate. Instead of asking "is this allowed?", the system asks "how much is allowed, and from where?" This reflects actual user experience—there's no context switch between systems; users simply keep using the product.
Real-Time Correctness and Auditability
Rather than adopting third-party billing platforms, OpenAI built an in-house distributed system to meet two critical requirements:
- Real-time decisions: When a user hits a rate limit, the system must immediately know if credits are available, preventing surprise blocks and balance inconsistencies
- Full transparency: Every request includes detailed reasoning: why it was allowed/blocked, usage consumed, and which limits or balances applied
The system maintains three separate, independently auditable datasets: product usage events (what users actually did), monetization events (what gets charged), and balance updates (credit adjustments). This separation enables offline reconciliation and prevents double-charging through idempotent transactions with stable keys.
Design for Trust
Notably, OpenAI prioritized provable correctness over strict enforcement. Credit balance updates are asynchronous (though near-real-time) to create an audit trail, allowing the company to demonstrate the system is functioning correctly. When delays cause users to slightly exceed their balance, the system automatically refunds the overage rather than blocking requests. This design choice reflects that user trust matters more than perfectly strict enforcement.
All balance decrements and update records are committed atomically per account with serialized concurrent request handling, preventing race conditions on shared balances. This foundation now enables OpenAI to offer additional access tiers and flexible monetization for both products.