Abstract
Direct preference optimization (DPO) methods have shown strong potential in aligning text-to-image diffusion models with human preferences by training on paired comparisons. These methods improve training stability by avoiding the REINFORCE algorithm but still struggle with challenges such as accurately estimating image probabilities due to the non-linear nature of the sigmoid function and the limited diversity of offline datasets. In this paper, we introduce Diffusion Denoising Ranking Optimization (Diffusion-DRO), a new preference learning framework grounded in inverse reinforcement learning. Diffusion-DRO removes the dependency on a reward model by casting preference learning as a ranking problem, thereby simplifying the training objective into a denoising formulation and overcoming the non-linear estimation issues found in prior methods. Moreover, Diffusion-DRO uniquely integrates offline expert demonstrations with online policy-generated negative samples, enabling it to effectively capture human preferences while addressing the limitations of offline data. Comprehensive experiments show that Diffusion-DRO delivers improved generation quality across a range of challenging and unseen prompts, outperforming state-of-the-art baselines in both both quantitative metrics and user studies.
From top to bottom, the text prompts are: "A Pixar lemon wearing sunglasses on a beach," "A dragon sitting on a couch in a digital illustration," "A detailed painting of Atlantis by multiple artists, featuring intricate detailing and vibrant colors," and "A passenger jet aircraft flying in the sky."
From top to bottom, the text prompts are: "A portrait of a smiling Dragonite in a sunflower field with a cloudy sky backdrop" "Amphitheater filled with crowd looking at a dumpster on fire in patriotic colors" "Beige canvas tents set up in an arctic landscape with no vegetation, surrounded by rolling hills - reminiscent of a romanticist painting" and "A small bird sitting in a metal wheel"
From top to bottom, the text prompts are: "A monkey wearing a jacket" "Portrait of a cyberpunk gang" "Bob Ross painting Mario on an easel in his office" and "A little girl holding a brown stuffed animal"
From top to bottom, the text prompts are: "A new artwork depicting Pikachu as a superhero fighting villains with dramatic lightning" "A futuristic cyberpunk Paris street" "A young girl with a red hat at night" and "A bike parked in front of a doorway"
Playground
BibTeX
@misc{wu2025rankingbasedpreferenceoptimizationdiffusion,
title={Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback},
author={Yi-Lun Wu and Bo-Kai Ruan and Chiang Tseng and Hong-Han Shuai},
year={2025},
eprint={2510.18353},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.18353},
}