Hugging Face launches Modular Diffusers with composable pipeline blocks for flexible image generation

What's New

Hugging Face has released Modular Diffusers, a new framework for composing diffusion pipelines from reusable building blocks. Rather than writing complete pipelines from scratch, developers can now assemble pre-built blocks—text encoding, VAE encoding, denoising, decoding—and customize them for specific workflows.

Key Features

Familiar API with Flexible Composition: The ModularPipeline class maintains the same simple API as the existing DiffusionPipeline, but pipelines are now constructed from composable blocks that can be inspected, modified, and swapped independently.

Custom Block Creation: Developers can define custom blocks by implementing three core methods:

expected_components: Declare what models the block needs
inputs/intermediate_outputs: Define data flowing in and out
__call__: Implement the computation logic

Dynamic Recomposition: Blocks can be added, removed, and reordered at runtime. The framework automatically handles input/output routing—for example, extracting a text encoder block converts downstream blocks to accept embeddings directly instead of raw prompts.

Lazy Loading and Memory Management: The framework separates pipeline definition from component loading, enabling efficient memory usage through lazy initialization and component reuse across multiple blocks.

Integration with Mellon

Modular Diffusers integrates with Mellon, a node-based visual workflow interface, allowing developers to wire blocks together without writing code.

Getting Started

The release includes documentation, example blocks (like depth extraction using Depth Anything V2), and support for advanced patterns like ControlNet workflows. Developers can load pre-built modular pipelines from Hugging Face Hub repositories configured with modular_model_index.json.

What's New

Key Features

Integration with Mellon

Getting Started

Tags

Published

Source