StandardDB - Track changes in AI

OpenAI 3 days ago

OpenAI and Amazon launch Stateful Runtime Environment for production-grade agents in Bedrock//OpenAI and Amazon have jointly built a Stateful Runtime Environment that runs natively in Amazon Bedrock, enabling developers to deploy multi-step AI agents with persistent state, context, and governance controls. The runtime eliminates the need for manual orchestration by handling state management, tool invocations, error handling, and long-running task resumption automatically.

OpenAI 3 days ago

OpenAI and Amazon launch Stateful Runtime Environment on AWS Bedrock; $50B investment announced//OpenAI and Amazon have announced a multi-year strategic partnership that includes a $50 billion investment and several new product launches. Key deliverables include a Stateful Runtime Environment for AI agents on Amazon Bedrock, exclusive AWS distribution of OpenAI Frontier for enterprise AI deployment, and a $100 billion compute agreement expanding on their existing partnership.

vLLM v0.16.0 brings async scheduling with pipeline parallelism, 30.8% throughput gains//vLLM's latest release introduces async scheduling combined with pipeline parallelism, delivering significant performance improvements and new WebSocket-based realtime audio streaming capabilities. The update adds support for 12+ new model architectures and major enhancements to speculative decoding, RLHF workflows, and Intel XPU platforms.

GitHub 4 days ago

GitHub's Enterprise AI Controls reaches general availability with agent governance tools//GitHub has released Enterprise AI Controls and the agent control plane as generally available features, giving enterprise administrators deeper oversight and auditability of AI agent usage across their organizations. The release includes new capabilities for discovering agent activity, configuring enterprise agent policies via API, and managing custom agent standards through fine-grained permissions.

Intercom launches 12 major Procedures and Simulations updates for complex AI customer support workflows//Intercom announced significant upgrades to Fin's Procedures and Simulations capabilities, enabling AI agents to handle complex, multi-step customer queries with greater control and reliability. The updates include AI-assisted Procedure drafting, deterministic controls, improved agentic behavior, and enhanced testing tools to help teams deploy confidently at scale.

GitHub 4 days ago

GitHub Copilot CLI reaches general availability with autonomous agentic capabilities//GitHub Copilot CLI, which debuted in public preview last September, is now generally available to all paid Copilot subscribers. The tool has evolved from a terminal assistant into a full autonomous coding agent with plan mode, autopilot capabilities, and cross-session memory, supporting multiple models from Anthropic, OpenAI, and Google.

Devin 2.2 Launch: 3x Faster Startup, Desktop Testing, v3 API Exit Beta//Devin releases version 2.2 with significant performance improvements, new desktop testing capabilities, and official v3 API launch. The update includes a redesigned UI, faster integrations with Slack and Linear, and support for end-to-end testing across Linux desktop applications.

Sourcegraph 7.0 debuts MCP integration for AI agents; adds semantic search across enterprise codebases//Sourcegraph 7.0 repositions the platform as a shared intelligence layer for both human developers and AI coding agents, introducing deep semantic search capabilities via the Model Context Protocol (MCP). The release enables AI agents to reason about cross-repository dependencies, architectural patterns, and code history with the same accuracy developers rely on.

Cloudflare 5 days ago

Cloudflare releases Vinext, Next.js-compatible framework built on Vite that builds 4.4x faster//Cloudflare unveiled Vinext, a drop-in replacement for Next.js built on Vite that deploys directly to Cloudflare Workers. Early benchmarks show production builds 4.4x faster and client bundles 57% smaller than Next.js 16, developed in one week with AI assistance.

GitHub 6 days ago

GitHub Enterprise Server 3.20 RC adds immutable releases, enhanced secret scanning, and backup service//GitHub Enterprise Server 3.20 release candidate introduces several major features including immutable releases for supply chain protection, enhanced secret scanning with validity checks and enterprise-level bypass controls, and a new backup service that replaces the need for separate backup utilities. The release also adds enterprise team management capabilities and new security roles for simplified governance.

Vercel 6 days ago

Vercel opens Chat SDK in public beta, unifies bot development across Slack, Teams, Discord, and five other platforms//Vercel has open-sourced the Chat SDK, a TypeScript library that allows developers to write chatbot logic once and deploy across Slack, Microsoft Teams, Google Chat, Discord, GitHub, and Linear. The SDK features event-driven architecture with type-safe handlers, JSX-based UI components that render natively on each platform, and pluggable state management adapters.

Cloudflare 1 week ago

Cloudflare One becomes first SASE platform with post-quantum encryption across all components//Cloudflare One now offers post-quantum hybrid ML-KEM encryption across its entire Secure Access Service Edge platform, including Secure Web Gateway, Zero Trust, and Wide Area Network services. The expansion covers Cloudflare IPsec (in closed beta) and Cloudflare One Appliance (generally available), enabling organizations to secure their enterprise network traffic against future quantum threats ahead of NIST's 2030 cryptographic transition deadline.

Cloudflare 1 week ago

Cloudflare launches MCP server using Code Mode, reducing API context requirements by 99.9%//Cloudflare introduced a new Model Context Protocol (MCP) server that provides access to the entire Cloudflare API using just two tools and consuming only ~1,000 tokens. The approach, called Code Mode, allows AI agents to write JavaScript code against a typed SDK to explore and execute API operations, reducing token usage from 1.17M to 1K compared to traditional MCP implementations.

Anthropic 1 week ago

Anthropic launches Claude Code Security in limited preview; found 500+ zero-day vulnerabilities in open-source code//Claude Code Security, a new AI-powered capability within Claude Code, automatically scans codebases for complex vulnerabilities and suggests patches—finding security issues that traditional static analysis tools miss. Available in limited research preview for Enterprise and Team customers, the tool leverages recent improvements in Claude's cybersecurity abilities demonstrated through over 500 zero-day discoveries in production open-source repositories.

Google 1 week ago

Google releases Gemini 3.1 Pro with 77.1% ARC-AGI score, doubling reasoning performance//Google has released Gemini 3.1 Pro, an upgraded AI model with significantly improved reasoning capabilities now available across consumer and developer platforms. The model achieves a verified 77.1% score on ARC-AGI-2 benchmarks—more than double the performance of its predecessor—and is designed for complex problem-solving tasks requiring advanced reasoning.

Anthropic 1 week ago

Anthropic launches automatic prompt caching and Claude Sonnet 4.6; retires earlier models//Anthropic has released Claude Sonnet 4.6 alongside automatic prompt caching for the Messages API, eliminating manual cache management. The company is retiring older models (Sonnet 3.7, Haiku 3.5) and deprecating Haiku 3 in April 2026.

NVIDIA and SGLang optimize DeepSeek for GB300 NVL72, achieving 226 tokens per second in 128K-token inference//NVIDIA and the SGLang team have published optimizations for running DeepSeek R1 on the GB300 NVL72 GPU, leveraging prefill-decode disaggregation, pipeline parallelism, and expert parallelism to achieve 226 tokens per second per GPU on long-context workloads. The optimization demonstrates a 1.53x throughput advantage over GB200 under identical conditions, with further gains possible through multi-token prediction.

Anthropic 1 week ago

Anthropic releases Claude Sonnet 4.6 with 70% preference over predecessor in coding tasks//Claude Sonnet 4.6 is now available as the default model on Claude.ai and Claude Cowork, bringing major improvements in coding, computer use, and long-context reasoning. The model features a 1M token context window in beta and performs on par with or better than the frontier Opus 4.5 model at Sonnet pricing ($3/$15 per million tokens).

Cursor 2.5 launches plugin marketplace and async subagents for multi-file workflows//Cursor's latest release introduces a plugin marketplace with pre-built integrations from partners like AWS, Figma, and Stripe, plus asynchronous subagents that allow the parent agent to continue working while background tasks execute. The update also adds fine-grained sandbox network access controls for enterprise security policies.

Alibaba releases Qwen3.5 model family; 397B variant matches Gemini 3 Pro and Claude Opus 4.5//Alibaba's Qwen3.5 is a new multimodal model family ranging from 27B to 397B parameters with mixture-of-experts architecture. The flagship 397B-A17B variant supports 256K context (expandable to 1M), hybrid thinking/non-thinking modes, and excels in coding, vision, and long-context reasoning, with performance comparable to leading proprietary models.

ElevenLabs expands ElevenAgents with versioning, RAG tools, and content guardrails//ElevenLabs released significant updates to its ElevenAgents API, introducing agent versioning, a new documentation search tool for RAG, MCP tool support, and configurable content moderation guardrails. The update also includes new endpoints for tracking conversation users and expanded SDK support across Python, JavaScript, and widget packages.

Cloudflare 2 weeks ago

Cloudflare Python SDK v5.0.0-beta.1 adds 40+ new API resources with breaking changes//Cloudflare released Python SDK v5.0.0-beta.1 featuring 40+ new API resources including AI Gateway, R2 Data Catalog, and Real-time Kit integrations. The beta includes significant breaking changes across 15+ existing resources due to OpenAPI schema improvements, requiring developers to review the migration guide before upgrading.

Railway releases AI agent for canvas, Postgres metrics, and network flow visualization//Railway is shipping three major features: a conversational AI agent for infrastructure management, dedicated Postgres database metrics with query statistics, and network flow visualization showing real-time traffic between services. The AI agent debuts in Priority Boarding, while Postgres metrics and network flows graduate to general availability.

Cloudflare 2 weeks ago

Cloudflare Python SDK v5.0.0-beta.1 introduces major breaking changes and 40+ new API resources//Cloudflare released the first beta version of Python SDK v5.0.0, featuring significant breaking changes driven by OpenAPI schema improvements and code generation updates. The release adds over 40 new API resources including AI-powered features, brand protection tools, D1 database management, and Real-time Kit integrations, alongside general fixes for type inference, request handling, and response parsing.

GitHub 2 weeks ago

GitHub launches Agentic Workflows in technical preview, enabling AI-driven repository automation in Markdown//GitHub Agentic Workflows let developers automate repository tasks using AI agents within GitHub Actions by writing workflows in plain Markdown instead of YAML. The feature, available via the `gh aw` CLI extension, supports natural language automation for issue triage, PR reviews, CI failure analysis, and repository maintenance with security-first defaults including read-only permissions and sandboxed execution.

Google 2 weeks ago

Google upgrades Gemini 3 Deep Think; achieves 84.6% on ARC-AGI-2 and gold-medal math performance//Google has released a major upgrade to Gemini 3 Deep Think, a specialized reasoning mode designed for science, research, and engineering challenges. The updated model is now available to Google AI Ultra subscribers in the Gemini app and available via API for select researchers, engineers, and enterprises through early access.

OpenAI 2 weeks ago

OpenAI releases GPT-5.3-Codex-Spark, optimized for real-time coding at 1000+ tokens per second//OpenAI has released GPT-5.3-Codex-Spark, a smaller, faster variant of GPT-5.3-Codex designed for real-time interactive coding tasks. Running on Cerebras' specialized hardware, the model delivers over 1000 tokens per second while maintaining strong performance on coding benchmarks, now available as a research preview to ChatGPT Pro users.

Allen AI releases AutoDiscovery, automated hypothesis generation tool now available in AstaLabs platform//AutoDiscovery is an AI-powered system that autonomously generates and tests scientific hypotheses on structured datasets, surfacing novel research directions without requiring researchers to specify questions in advance. The tool uses Bayesian surprise and Monte Carlo Tree Search to prioritize high-value experiments, and is now available as an experimental feature in the Asta platform.

AI2 launches MolmoSpaces, an open simulation platform for embodied AI with 230,000 scenes and 42 million grasps//MolmoSpaces is a large-scale, open-source ecosystem for training and evaluating embodied AI systems, unifying over 230,000 indoor scenes, 130,000+ object models, and 42 million annotated robotic grasps. The platform features physics-grounded simulation, a systematic benchmark for measuring generalization across multiple axes, and compatibility with major simulators like MuJoCo, ManiSkill, and NVIDIA Isaac.

Supabase 2 weeks ago

Supabase acquires Hydra team to build open data warehouse for Postgres//Supabase is welcoming Joe Sciarrino, co-creator of Hydra, to lead the development of Supabase Warehouse and an Open Warehouse Architecture initiative. The team will leverage pg_duckdb, an open-source Postgres extension that accelerates analytics queries by 600x, to enable serverless analytics workflows on Postgres with object storage integration.

OpenAI 2 weeks ago

OpenAI updates GPT-5.2 Instant, launches GPT-5.3-Codex with 25% faster performance//OpenAI has released multiple model updates including an improved GPT-5.2 Instant with more measured responses and clearer output on advice-seeking questions. The company also introduced GPT-5.3-Codex, a unified coding model combining code generation with general-purpose reasoning, delivering 25% faster performance and new benchmark highs.

Unsloth ships MoE training kernels with 12x speedup and 35% lower VRAM usage//Unsloth introduced custom Triton kernels and optimizations for training Mixture of Experts (MoE) language models, delivering 12x faster training speeds with over 35% reduction in VRAM consumption and support for 6x longer context windows. The update supports popular MoE models including Qwen3, DeepSeek R1/V3, and GPT-OSS, working across data-center and consumer GPUs.

OpenAI 2 weeks ago

OpenAI deploys ChatGPT to Pentagon's GenAI.mil platform for 3 million personnel//OpenAI is bringing a custom version of ChatGPT to GenAI.mil, the Department of Defense's secure enterprise AI platform used by military and civilian personnel. The deployment includes built-in safety controls and data isolation safeguards to protect sensitive government information while enabling service members to access AI capabilities for operational and administrative tasks.

ElevenLabs reorganizes into platform structure, launches custom guardrails and WhatsApp outbound API//ElevenLabs restructured its product portfolio into three distinct platforms—ElevenAgents, ElevenCreative, and the new ElevenAPI—while graduating global server routing out of beta. The Agents Platform now supports custom guardrails, WhatsApp outbound messaging, improved TTS models, and enhanced error handling for webhook tools.

Cloudflare 3 weeks ago

Cloudflare Agents SDK v0.4.0 adds readonly connections, MCP OAuth customization, and x402 v2 support//The Agents SDK v0.4.0 introduces readonly WebSocket connections for dashboards and spectator views, custom OAuth providers for MCP server authentication, and upgrades the MCP SDK to prevent cross-client response leakage. The release also completes migration to x402 v2 with new network identifiers and lazy server initialization.

Transformers.js v4 Preview Debuts on NPM with New WebGPU Runtime and 10x Build Speed Gains//Transformers.js v4 preview is now available on NPM under the `@next` tag, bringing a complete rewrite with a new WebGPU runtime and major performance improvements. The release includes support for ~200 model architectures, cross-runtime compatibility (browsers, Node, Bun, Deno), and architectural optimizations that deliver 4x speedups for embedding models and 10x faster builds.

Vercel's v0 Moves from Prototyping to Production with Real Codebases and Git Workflows//Vercel has rebuilt v0 into a platform for shipping production applications rather than quick prototypes, adding support for real codebases, Git workflows, security controls, and integration with live databases. The update positions v0 as infrastructure for both traditional apps and future AI agent workflows.

OpenAI 3 weeks ago

OpenAI Launches Trusted Access for Cyber, Commits $10M in API Credits for Defensive Security Work//OpenAI is introducing Trusted Access for Cyber, an identity-based framework that grants priority access to GPT-5.3-Codex and other frontier models for cybersecurity professionals and researchers. The initiative includes $10 million in API credits through the scaled Cybersecurity Grant Program to accelerate vulnerability discovery and remediation in open source and critical infrastructure.

OpenAI 3 weeks ago

OpenAI Launches Frontier Platform for Enterprise AI Agent Deployment and Management//OpenAI has introduced Frontier, a comprehensive platform designed to help enterprises build, deploy, and manage AI agents at scale. The platform provides AI coworkers with shared business context, integrated execution environments, and governance capabilities—enabling organizations to move beyond isolated agent pilots to production-grade AI systems that work across multiple applications and data sources.

Supabase 3 weeks ago

Supabase adds PrivateLink, Ethereum integration, and Claude connector//Supabase released multiple major features in February 2026 including PrivateLink for private AWS connectivity, direct Ethereum blockchain querying via SQL, and official Claude integration. The update also includes a breaking change disabling pg_graphql by default on new projects for improved security posture.

Anthropic 3 weeks ago

Anthropic releases Claude Opus 4.6 with 1M token context, outperforms GPT-5.2 by 144 Elo points//Anthropic introduced Claude Opus 4.6, its most capable flagship model to date, featuring a 1M token context window in beta and state-of-the-art performance on coding, reasoning, and knowledge work benchmarks. The model is now available on claude.ai, the API, and major cloud platforms with pricing unchanged at $5/$25 per million tokens.

OpenAI 3 weeks ago

OpenAI releases GPT-5.3-Codex with agentic coding capabilities; achieves new SWE-Bench Pro high and 25% faster performance//GPT-5.3-Codex is OpenAI's most capable coding model to date, combining frontier coding performance with advanced reasoning and agentic capabilities. The model sets new benchmarks on SWE-Bench Pro and Terminal-Bench 2.0, while operating 25% faster than its predecessor and enabling developers to delegate complex, long-running tasks without losing context during iteration.

GitHub 3 weeks ago

GitHub Copilot v1.109 adds Claude agent support and multi-agent orchestration in VS Code//GitHub Copilot's January release introduces public preview support for Anthropic's Claude agent SDK, enabling developers to delegate tasks directly within VS Code. The update also enhances agent session management with multi-agent orchestration, improved reasoning capabilities, and new security features like terminal command sandboxing.

NVIDIA releases Nemotron ColEmbed V2 multimodal models; 8B variant ranks #1 on ViDoRe V3 benchmark//NVIDIA has released the Nemotron ColEmbed V2 family, a set of late-interaction multimodal embedding models in 3B, 4B, and 8B sizes designed for visual document retrieval. The flagship 8B model achieves 63.42 NDCG@10 on the ViDoRe V3 benchmark, ranking first in its class for enterprise document retrieval tasks.

AI2 publishes Nature paper on retrieval-augmented model for scientific literature synthesis with verifiable citations//AI2 and University of Washington researchers published a paper in Nature describing OpenScholar, an open-source retrieval-augmented language model designed to synthesize scientific literature with verifiable citations. The system searches a corpus of 45 million open-access papers and includes tools like ScholarQABench for evaluating citation quality.

Vercel 3 weeks ago

Vercel Workflow 4.1 Beta introduces event-sourced architecture with self-healing capabilities//Workflow 4.1 Beta fundamentally changes how the system tracks state by moving to an event-sourced architecture where state changes are stored as an immutable log of events rather than updating records in place. The update brings automatic recovery from queue failures, complete audit trails for debugging, and improved throughput supporting thousands of steps per second.

Deno launches Sandbox for running untrusted code with network isolation and secret protection//Deno Sandbox is a new feature that lets developers safely execute untrusted or LLM-generated code in lightweight Linux microVMs with configurable network access and secret management. The service boots sandboxes in under a second and allows direct deployment to Deno Deploy without rebuilding.

Deno Deploy reaches general availability with zero-config deployment and integrated databases//Deno Deploy is now generally available, offering streamlined JavaScript and TypeScript deployment across any framework without vendor-specific configuration. The platform includes automatic framework detection, continuous deployment from GitHub, integrated Postgres and Deno KV databases, and a new Deno Sandbox service for secure microVM execution.

OpenAI 4 weeks ago

OpenAI launches Codex app, a multi-agent development environment with native code review and automations//OpenAI has released the Codex app for macOS, a desktop application that serves as a command center for managing and supervising AI agents working on software development tasks. The app enables developers to run multiple agents in parallel, review diffs, leave inline feedback, and schedule automated workflows—all without juggling terminal windows.

ElevenLabs ships Eleven v3 GA, WAV support, and Agents Platform enhancements//ElevenLabs released Eleven v3 out of alpha with improved stability, accuracy, and lower latency. The update includes WAV output format support for Text-to-Dialogue, expanded Agents Platform capabilities with branch renaming and guardrails, and multiple SDK updates across Python, JavaScript, React, and widget packages.

OpenAI 4 weeks ago

OpenAI launches Codex app for macOS with multi-agent orchestration and skill framework//OpenAI has released the Codex app for macOS, a dedicated interface for managing and running multiple coding agents in parallel on long-running tasks. The release includes expanded access to Codex through ChatGPT Free and Go plans, doubled rate limits across paid tiers, and a new skills framework that extends Codex beyond code generation to handle complex workflows and integrations.

Harvey scales legal knowledge coverage to 60+ jurisdictions with autonomous agent pipeline//Harvey has built "The Data Factory," an automated system using AI agents to discover, validate, and integrate legal data sources at scale. Since August 2025, the pipeline has expanded knowledge source coverage from 6 to 60+ jurisdictions and integrated over 400 legal data sources, enabling agents to handle complex queries across global legal databases without manual setup.

View all changelogs →