← Back
Hugging Face
Hugging Face launches Storage Buckets for ML artifacts with deduplication via Xet backend
· featureplatformapirelease · huggingface.co ↗

What's New: Storage Buckets

Hugging Face has launched Storage Buckets, a mutable object storage service purpose-built for ML production workloads. Unlike traditional Git-based versioning, Buckets are designed for the constant stream of intermediate files generated during training and data processing—checkpoints, optimizer states, processed shards, logs, and traces.

Key Features

Non-versioned Storage with Hub Integration

  • Create and manage buckets via the hf CLI or programmatically
  • Buckets live under user or organization namespaces with standard Hugging Face permissions
  • Can be marked private or public with browser-accessible pages
  • Addressable with handles like hf://buckets/username/my-bucket

Xet-Powered Deduplication

  • Built on Hugging Face's chunk-based Xet storage backend, which breaks files into chunks and deduplicates across them
  • When uploading similar datasets or successive model checkpoints with frozen layers, Buckets skip already-stored content
  • Results in lower bandwidth usage, faster transfers, and reduced storage footprint
  • For Enterprise customers, billing is based on deduplicated storage, directly reducing costs

Pre-warming for Regional Performance

  • Brings frequently accessed data closer to compute resources in specific cloud regions
  • Eliminates repeated cross-region data transfers for distributed training and large-scale pipelines
  • Partnerships with AWS and GCP available initially, with more cloud providers coming

Getting Started

Create and use a bucket in minutes:

curl -LsSf https://hf.co/cli/install.sh | bash
hf auth login
hf buckets create my-training-bucket --private

The bucket can then be accessed programmatically for training pipelines, data processing workflows, and agent systems that generate frequent writes and overwrites.