← Back
Hugging Face
Hugging Face launches Storage Buckets, mutable S3-like object storage with content deduplication
· featureplatformapi · huggingface.co ↗

What are Storage Buckets?

Storage Buckets are mutable, non-versioned object storage containers on the Hugging Face Hub specifically designed for the constant stream of intermediate files generated during ML development. Unlike traditional versioned repositories, Buckets are optimized for artifacts that change frequently, arrive from multiple jobs simultaneously, and rarely need git-style version control.

Each Bucket:

  • Lives under a user or organization namespace with standard Hugging Face permissions
  • Can be private or public
  • Is addressable programmatically via handles like hf://buckets/username/my-training-bucket
  • Has a browsable web interface on the Hub
  • Is accessible through Python APIs and the hf CLI

Key Innovation: Content Deduplication with Xet

Buckets leverage Xet, Hugging Face's chunk-based storage backend, which breaks content into chunks and deduplicates across files. This is a natural fit for ML workloads where related artifacts share significant content overlap:

  • Processed vs. raw datasets: Many chunks already exist, avoiding redundant uploads
  • Successive model checkpoints: Large frozen portions are deduplicated across versions
  • Traces and derivatives: Related files reference common chunks

For Enterprise customers, billing is based on deduplicated storage, directly reducing costs. For all users, this approach means faster transfers and more efficient bandwidth usage.

Pre-warming: Data Locality Optimization

Buckets include pre-warming capabilities to bring frequently-accessed data closer to compute. Instead of data traveling across regions on every read, users can declare where they need data and Buckets ensure it's cached in that region before jobs start. This is especially valuable for:

  • Training clusters requiring fast access to large datasets or checkpoints
  • Multi-region pipelines running across different cloud providers
  • Distributed training that benefits from local storage proximity

Hugging Face is starting with AWS and GCP partnerships, with additional cloud providers planned.

Getting Started

Creating and using a Bucket takes minutes:

curl -LsSf https://hf.co/cli/install.sh | bash
hf auth login
hf buckets create my-training-bucket --private

Buckets can then be managed and accessed programmatically through the hf CLI, Python SDK, and standard S3-like APIs.