Hugging Face launches Storage Buckets for ML artifacts with deduplication via Xet backend
What's New: Storage Buckets
Hugging Face has launched Storage Buckets, a mutable object storage service purpose-built for ML production workloads. Unlike traditional Git-based versioning, Buckets are designed for the constant stream of intermediate files generated during training and data processing—checkpoints, optimizer states, processed shards, logs, and traces.
Key Features
Non-versioned Storage with Hub Integration
- Create and manage buckets via the
hfCLI or programmatically - Buckets live under user or organization namespaces with standard Hugging Face permissions
- Can be marked private or public with browser-accessible pages
- Addressable with handles like
hf://buckets/username/my-bucket
Xet-Powered Deduplication
- Built on Hugging Face's chunk-based Xet storage backend, which breaks files into chunks and deduplicates across them
- When uploading similar datasets or successive model checkpoints with frozen layers, Buckets skip already-stored content
- Results in lower bandwidth usage, faster transfers, and reduced storage footprint
- For Enterprise customers, billing is based on deduplicated storage, directly reducing costs
Pre-warming for Regional Performance
- Brings frequently accessed data closer to compute resources in specific cloud regions
- Eliminates repeated cross-region data transfers for distributed training and large-scale pipelines
- Partnerships with AWS and GCP available initially, with more cloud providers coming
Getting Started
Create and use a bucket in minutes:
curl -LsSf https://hf.co/cli/install.sh | bash
hf auth login
hf buckets create my-training-bucket --private
The bucket can then be accessed programmatically for training pipelines, data processing workflows, and agent systems that generate frequent writes and overwrites.