What are Storage Buckets?
Storage Buckets are non-versioned, mutable storage containers on the Hugging Face Hub designed specifically for ML production artifacts. Unlike Git-based repositories which prioritize version control, Buckets provide fast writes, efficient overwrites, directory synchronization, and file removal capabilities—making them ideal for the constant stream of intermediate files generated during ML workflows.
Each Bucket:
- Lives under a user or organization namespace with standard Hugging Face permissions
- Can be private or public with a browsable Hub interface
- Is addressable programmatically via handles like
hf://buckets/username/my-training-bucket - Supports scripting from Python and management via the
hfCLI
Built-in Deduplication with Xet
The key advantage of Storage Buckets is integration with Xet, Hugging Face's chunk-based storage backend. Instead of treating files as monolithic blobs, Xet breaks content into chunks and deduplicates across the entire Bucket.
This approach is particularly beneficial for ML workloads where:
- Successive training checkpoints share frozen model weights
- Processed datasets largely overlap with raw source data
- Multiple related artifacts are stored together
For enterprise customers, billing is based on deduplicated storage, directly reducing costs when related files share common content. This translates to less bandwidth usage, faster transfers, and more efficient storage footprints.
Pre-warming for Distributed Workloads
Storage Buckets support pre-warming, which brings frequently accessed data closer to the cloud provider and region where compute runs. This is critical for:
- Training clusters requiring fast checkpoint and dataset access
- Multi-region pipelines running components in different clouds
- Large-scale distributed training that cannot afford cross-region data transfers
Hugging Face is partnering with AWS and GCP initially, with support for additional cloud providers coming in the future.
Getting Started
Users can create and manage Buckets using the hf CLI:
curl -LsSf https://hf.co/cli/install.sh | bash
hf auth login
hf buckets create my-training-bucket --private
The service is available now on the Hugging Face Hub and integrates with existing Hub authentication and permission systems.