← Back
Hugging Face
Hugging Face launches Storage Buckets for ML artifact management
· featureplatformrelease · huggingface.co ↗

What are Storage Buckets?

Storage Buckets are mutable, non-versioned storage containers on the Hugging Face Hub designed for the constant stream of intermediate files generated by production ML workflows. Unlike Models and Datasets repos which track final artifacts with version control, Buckets are purpose-built for files that change frequently, arrive from multiple jobs simultaneously, and rarely need git-style versioning.

Each Bucket lives under a user or organization namespace, respects standard Hugging Face permissions, can be private or public, and is addressable programmatically as hf://buckets/username/bucket-name. You can browse Buckets directly on the Hub, script interactions with Python, or manage them via the hf CLI.

Key Technical Features

Xet-backed deduplication: Buckets leverage Hugging Face's Xet chunk-based storage backend, which breaks files into chunks and deduplicates across them. This is particularly valuable for ML workloads where related artifacts share significant overlap—successive training checkpoints with frozen model layers, raw and processed datasets, or agent traces with shared summaries. For Enterprise customers, billing is based on deduplicated storage, directly reducing costs.

Pre-warming for multi-cloud deployments: Buckets support pre-warming, allowing you to bring frequently-accessed data closer to your compute infrastructure. Rather than fetching data across regions on every read, you declare which cloud provider and region needs the data, and Buckets ensures it's already available. This is critical for distributed training clusters and large-scale pipelines. AWS and GCP are supported initially, with more providers planned.

Getting Started

Creating and using a Bucket takes minutes:

# Install and authenticate
curl -LsSf https://hf.co/cli/install.sh | bash
hf auth login

# Create a private bucket
hf buckets create my-training-bucket --private

Buckets are designed for typical ML workflows: write fast, overwrite when needed, sync directories, remove stale files, and keep data moving without version control overhead.