MAD: Makeup All-in-One with Cross-Domain Diffusion Model

Bo-Kai Ruan, Hong-Han Shuai,
National Yang Ming Chiao Tung University
this slowpoke moves

The proposed MAD framework achieves a range of applications, including (a) beauty filter, (b) makeup removal, (c) text modification, (d) single-makeup transfer, (e) scale transfer, (f) component transfer, and (g) multi-makeup transfer.

Abstract

Existing techniques often require designing multiple models to handle different inputs and align features across domains for different makeup tasks, e.g., beauty filter, makeup transfer, and makeup removal, leading to increased complexity. Another limitation is the absence of text-guided makeup try-on, which is more user-friendly without needing reference images. In this study, we make the first attempt to use a single model for various makeup tasks. Specifically, we formulate different makeup tasks as cross-domain translations and leverage a cross-domain diffusion model to accomplish all tasks. Unlike existing methods that rely on separate encoder-decoder configurations or cycle-based mechanisms, we propose using different domain embeddings to facilitate domain control. This allows for seamless domain switching by merely changing embeddings with a single model, thereby reducing the reliance on additional modules for different tasks. Moreover, to support precise text-to-makeup applications, we introduce the MT-Text dataset by extending the MT dataset with textual annotations, significantly advancing the practicality and applicability of makeup technologies.

Core Idea

core idea

Each task is modeled as a translation problem between different domains. For text modification and makeup transfer, guidance such as text or a reference image is provided to generate a specific style .

✨ A. Translation Pipeline


pipeline

Illustration of the cross-domain diffusion pipeline for time step t. Initially, the pipeline generates a latent code representing the source domain, which is subsequently used in the target domain generation to ensure detail preservation. During the generation phase, a preserved mask can be applied to maintain non-facial regions or to modify specific components.

📸 B. Visualization

B-1. Beauty Filter

beauty filter

B-2. Makeup Removal

makeup removal

B-3. Scale and Combine with different makeup

scale and combine

Makeup Transfer with different scales and styles. Top Left: Adjustable makeup scales. Bottom Left: Transfer intensities for individual components. Right: Combinations of scales and styles from different images for various components. Notations 'E', 'EB', 'F', and 'L' denote eyes, eyebrows, face, and lips, respectively, with superscript numbers indicating the component scale.

B-4. Text Modification

text modification

Text modification for non-makeup (top) and makeup (bottom) images. The images on the left modify the entire facial region, while those on the right demonstrate modification with a brush mask.

🗃️ C. MT-Text Dataset

We enhance the MT dataset [1] with textual annotations. Initial descriptions of makeup for eyes, face, and lips were generated using GPT-4 Vision, followed by manual verification to correct inaccuracies. Users can get the dataset from here.


text annotation

Examples of our extended text-to-makeup dataset.

Reference

  1. Li, T., Qian, R., Dong, C., Liu, S., Yan, Q., Zhu, W., Lin, L.: Beautygan: Instance-level facial makeup transfer with deep generative adversarial network. In: ACM International Conference on Multimedia. pp. 645–653 (2018)

BibTeX


            BobTex