Existing techniques often require designing multiple models to handle different inputs and align features
across domains for different makeup tasks, e.g., beauty filter, makeup transfer, and makeup
removal, leading to increased complexity. Another limitation is the absence of text-guided makeup try-on,
which is more user-friendly without needing reference images. In this study, we make the first attempt to
use a single model for various makeup tasks. Specifically, we formulate different makeup tasks as
cross-domain translations and leverage a cross-domain diffusion model to accomplish all tasks. Unlike
existing methods that rely on separate encoder-decoder configurations or cycle-based mechanisms, we
propose using different domain embeddings to facilitate domain control. This allows for seamless domain
switching by merely changing embeddings with a single model, thereby reducing the reliance on additional
modules for different tasks. Moreover, to support precise text-to-makeup applications, we introduce the
MT-Text dataset by extending the MT dataset with textual annotations, significantly advancing the
practicality and applicability of makeup technologies.