Not All Thats Rare Is Lost: Causal Paths to Rare Concept Synthesis

Bo-Kai Ruan, Zi-Xiang Ni, Bo-Lun Huang, Teng-Fang Hsiao, Hong-Han Shuai,
National Yang Ming Chiao Tung University
this slowpoke moves

Generated samples from RAP across different models for rare concept prompts. Our method adaptively adjusts the switching point and applies second-order denoising, producing more accurate and consistent results across different settings.

Abstract

Diffusion models have shown strong capabilities in high-fidelity image generation but often falter when synthesizing rare concepts, i.e., prompts that are infrequently observed in the training distribution. In this paper, we introduce RAP, a principled framework that treats rare concept generation as navigating a latent causal path: a progressive, model-aligned trajectory through the generative space from frequent concepts to rare targets. Rather than relying on heuristic prompt alternation, we theoretically justify that rare prompt guidance can be approximated by semantically related frequent prompts. We then formulate prompt switching as a dynamic process based on score similarity, enabling adaptive stage transitions. Furthermore, we reinterpret prompt alternation as a second-order denoising mechanism, promoting smooth semantic progression and coherent visual synthesis. Through this causal lens, we align input scheduling with the model's internal generative dynamics. Experiments across diverse diffusion backbones demonstrate that RAP consistently enhances rare concept generation, outperforming strong baselines in both automated evaluations and human studies.


Issues to Solve

stage switching
inconsistent

Left: Switching too early may miss the "horned" detail; switching too late may ignore the "elephant". Right: Abrupt shifts between prompts can disrupt the continuity of the generative trajectory.

✨ A. Adaptive Prompt Switching

alternate prompt

To support adaptive prompt switching, we propose to use a score-based model to estimate the score of the rare concept at each stage. We then use the score to determine the switching point.

🚀 B. Second-Order Denoising

second-order denoising

We propose the prompt switching as a second-order denoising mechanism, promoting smooth semantic progression and coherent visual synthesis.

📸 C. Visualization

visualization

🎯 D. Matching Score

score

Illustration of average matching score delta_t for different prompt stages with SD3. Different color represents different prompt, and the horizontal dashed line indicates the threshold. The matching score for each prompt tends to decrease over time, supporting our proposed criterion that once the score difference becomes sufficiently small. Additionally, transient spikes in the matching score may occur when transitioning to a new prompt, indicating that the newly introduced prompt does not yet match the underlying distribution.

BibTeX


            BobTex