CIGEval: A New Agent-Based Framework for Evaluating Conditional Image Generation

A New Approach to Evaluating Conditional Image Generation: CIGEval

Conditional image generation, the creation of images based on specific instructions, has gained enormous importance in recent years. Applications range from personalized content to complex design processes. However, a central problem in this field is the development of reliable, explainable, and above all, task-agnostic evaluation metrics. Existing approaches often reach their limits because they are either tailored to specific tasks or inadequately reflect human perception.

A promising new approach to solving this problem is CIGEval, a unified, agent-based framework for comprehensive evaluation of conditional image generation tasks. CIGEval utilizes large multimodal models (LMMs) as its core and integrates a multifunctional toolbox. This toolbox enables fine-grained analysis of the generated images and offers various tools for assessing different aspects, such as aesthetics, consistency, and adherence to the given instructions.

A particular advantage of CIGEval lies in the use of synthesized evaluation paths for fine-tuning. This allows even smaller LMMs to autonomously select the appropriate tools and perform differentiated analyses based on the tool results. This enables efficient and scalable evaluation without relying on the computing power of particularly large models.

In experiments conducted across seven different conditional image generation tasks, CIGEval (GPT-4o version) showed a high correlation of 0.4625 with human evaluations. This value is close to the inter-annotator correlation of 0.47, which underscores the reliability of the framework. Remarkably, CIGEval, implemented with 7B open-source LMMs and only 2.3K training data, outperforms previous state-of-the-art methods based on GPT-4o.

Case studies on image generation with GPT-4o illustrate CIGEval's ability to identify subtle problems related to subject consistency and adherence to control instructions. This points to the great potential of CIGEval to automate the evaluation of image generation tasks with human-level reliability. The development of CIGEval thus represents an important step towards a more objective and efficient evaluation of generated images and could significantly advance the development of conditional image generation.

For companies like Mindverse, which specialize in AI-powered content creation, CIGEval offers exciting possibilities. By integrating CIGEval into the Mindverse platform, customers could automatically and efficiently evaluate the quality of their generated images. This would optimize the workflow and facilitate the development of high-quality, targeted content. Furthermore, CIGEval opens up perspectives for the development of customized LMM-based solutions, such as chatbots or AI search engines, that benefit from improved image evaluation.

Bibliographie: - https://arxiv.org/abs/2504.07046 - https://arxiv.org/html/2504.07046v1 - http://paperreading.club/page?id=298489 - https://chatpaper.com/chatpaper/?id=4&date=1744214400&page=1 - https://aclanthology.org/2024.lrec-main.1403.pdf - https://github.com/HelenMao/Conditional-Image-Generation-Papers - https://www.researchgate.net/publication/390214124_MMGen_Unified_Multi-modal_Image_Generation_and_Understanding_in_One_Go - https://proceedings.neurips.cc/paper_files/paper/2024/file/e7c786024ca718f2487712bfe9f51030-Paper-Conference.pdf - https://paperswithcode.com/?c=nerd&page=3 - https://cvpr.thecvf.com/Conferences/2025/AcceptedPapers