DMM: A Novel Approach to Versatile Image Generation Through Model Fusion

Versatile Image Generation through Model Fusion: DMM Sets New Standards

The rapid development of text-to-image (T2I) generation models has led to a multitude of specialized models, each fine-tuned on specific datasets. This specialization, however, leads to high parameter redundancy and significant memory requirements. Efficient methods for consolidating and unifying the capabilities of different powerful models into a single model are therefore urgently needed.

A common approach to model fusion is static linear interpolation in parameter space to achieve style mixing. However, this method does not consider the specifics of T2I generation, where different models cover different styles, which can lead to incompatibilities and confusion in the merged model.

To address this issue, a style-controllable image generation method has been developed that enables the precise generation of images in arbitrary styles under the control of style vectors. Based on this design is the Score-Distillation-based Model Merging paradigm (DMM), which compresses multiple models into a single versatile T2I model.

DMM reconsiders the task of model fusion in the context of T2I generation and formulates new fusion objectives and evaluation protocols. Instead of simply combining models, DMM aims to compactly reorganize the knowledge from multiple teacher models and enable controllable generation of arbitrary styles.

How DMM Works

DMM utilizes score distillation to transfer the knowledge of different specialized T2I models into a single model. This involves extracting the "scores" or probability distributions of the teacher models for different styles and using them to train the student model. The student model thus learns to imitate and combine the different styles of the teacher models.

Style control is achieved through style vectors, which serve as input for the merged model. By varying these vectors, the user can precisely control the desired style of the generated image.

Advantages of DMM

DMM offers several advantages over conventional methods of model fusion:

Reduced memory requirements: By consolidating multiple models into a single model, memory requirements are significantly reduced.

Improved versatility: The merged model can generate images in a variety of styles covered by the teacher models.

Controllable style generation: The use of style vectors allows precise control over the generated image style.

Outlook

DMM represents a promising approach to efficient and flexible image generation. The ability to combine the knowledge of different specialized models into a single model opens up new possibilities for the development of powerful and versatile T2I systems. Future research could focus on improving the scalability of DMM and extending it to other application areas of image generation.

Bibliography: - https://openreview.net/forum?id=t73rC2GJQJ - https://openreview.net/pdf/173cde3a217052a60ce2fcc0a2b2d7852bd8b1b1.pdf - https://paperswithcode.com/author/sanja-fidler - https://cvpr.thecvf.com/virtual/2023/papers.html - https://www.paperdigest.org/2024/09/most-influential-sigir-papers-2024-09/ - https://iclr.cc/virtual/2024/session/19806 - https://github.com/youngfish42/Awesome-FL - https://dl.acm.org/doi/10.1145/3665869 - https://journals.biologists.com/dmm/article/18/9/dmm052185/367500/Correlative-3D-imaging-method-for-analysing-lesion - https://arxiv.org/abs/2406.08431 - https://arxiv.org/abs/2504.12364