Overview
Hugging Face has released Modular Diffusers, a new framework that replaces the traditional monolithic DiffusionPipeline approach with a composable, block-based architecture. Instead of writing entire pipelines from scratch, developers can now mix and match pre-built blocks to create workflows tailored to their specific needs.
Key Features
Flexible Block Composition: Modular Diffusers breaks diffusion pipelines into self-contained blocks — text encoding, image encoding, denoising, and decoding — that can be freely added, removed, or swapped. Each block has clearly defined inputs and outputs, and can run independently or as part of a larger pipeline.
Familiar API: The high-level API remains similar to DiffusionPipeline, allowing existing code to migrate easily:
pipe = ModularPipeline.from_pretrained("black-forest-labs/FLUX.2-klein-4B")
pipe.load_components(torch_dtype=torch.bfloat16)
image = pipe(prompt="a serene landscape at sunset", num_inference_steps=4).images[0]
Custom Block Creation: Developers can write custom blocks by defining components, inputs, outputs, and computation logic. The example demonstrates a DepthProcessorBlock that extracts depth maps from images using Depth Anything V2, ready to integrate into any workflow.
Benefits for Developers
- Modularity: Run individual blocks as standalone pipelines or compose them into custom workflows
- Memory Efficiency: Use
ComponentsManagerfor lazy loading and fine-grained memory management - Extensibility: Create custom blocks for domain-specific tasks and share them with the community
- Visual Workflows: Integration with Mellon, a node-based visual interface, enables non-code workflow composition
Getting Started
Full documentation and examples are available in the Modular Diffusers documentation. The framework is designed to complement rather than replace the existing DiffusionPipeline class, giving developers the flexibility to choose the approach that best fits their use case.