Diffusion models have shown strong capabilities in high-fidelity image generation but often falter when synthesizing rare concepts, i.e., prompts that are infrequently observed in the training distribution. In this paper, we introduce RAP, a principled framework that treats rare concept generation as navigating a latent causal path: a progressive, model-aligned trajectory through the generative space from frequent concepts to rare targets. Rather than relying on heuristic prompt alternation, we theoretically justify that rare prompt guidance can be approximated by semantically related frequent prompts. We then formulate prompt switching as a dynamic process based on score similarity, enabling adaptive stage transitions. Furthermore, we reinterpret prompt alternation as a second-order denoising mechanism, promoting smooth semantic progression and coherent visual synthesis. Through this causal lens, we align input scheduling with the model's internal generative dynamics. Experiments across diverse diffusion backbones demonstrate that RAP consistently enhances rare concept generation, outperforming strong baselines in both automated evaluations and human studies.
Left: Switching too early may miss the "horned" detail; switching too late may ignore the "elephant". Right: Abrupt shifts between prompts can disrupt the continuity of the generative trajectory.
To support adaptive prompt switching, we propose to use a score-based model to estimate the score of the rare concept at each stage. We then use the score to determine the switching point.
We propose the prompt switching as a second-order denoising mechanism, promoting smooth semantic progression and coherent visual synthesis.
Illustration of average matching score delta_t for different prompt stages with SD3. Different color represents different prompt, and the horizontal dashed line indicates the threshold. The matching score for each prompt tends to decrease over time, supporting our proposed criterion that once the score difference becomes sufficiently small. Additionally, transient spikes in the matching score may occur when transitioning to a new prompt, indicating that the newly introduced prompt does not yet match the underlying distribution.
BobTex