Hugging Face launches Storage Buckets for ML artifacts with deduplication via Xet backend

What's New: Storage Buckets

Hugging Face has launched Storage Buckets, a mutable object storage service purpose-built for ML production workloads. Unlike traditional Git-based versioning, Buckets are designed for the constant stream of intermediate files generated during training and data processing—checkpoints, optimizer states, processed shards, logs, and traces.

Key Features

Non-versioned Storage with Hub Integration

Create and manage buckets via the hf CLI or programmatically
Buckets live under user or organization namespaces with standard Hugging Face permissions
Can be marked private or public with browser-accessible pages
Addressable with handles like hf://buckets/username/my-bucket

Xet-Powered Deduplication

Built on Hugging Face's chunk-based Xet storage backend, which breaks files into chunks and deduplicates across them
When uploading similar datasets or successive model checkpoints with frozen layers, Buckets skip already-stored content
Results in lower bandwidth usage, faster transfers, and reduced storage footprint
For Enterprise customers, billing is based on deduplicated storage, directly reducing costs

Pre-warming for Regional Performance

Brings frequently accessed data closer to compute resources in specific cloud regions
Eliminates repeated cross-region data transfers for distributed training and large-scale pipelines
Partnerships with AWS and GCP available initially, with more cloud providers coming

Getting Started

Create and use a bucket in minutes:

curl -LsSf https://hf.co/cli/install.sh | bash
hf auth login
hf buckets create my-training-bucket --private

The bucket can then be accessed programmatically for training pipelines, data processing workflows, and agent systems that generate frequent writes and overwrites.

What's New: Storage Buckets

Key Features

Getting Started

Tags

Published

Source