BlenderGym: A New Benchmark for Vision-Language Models in 3D Graphics Editing

```html

Automating 3D Graphics Editing: BlenderGym as a New Benchmark for Vision-Language Models

Editing 3D graphics is an essential part of fields like film production and game design. However, the process is time-consuming and requires highly specialized expertise. Automating these complex tasks presents a challenge, as it demands diverse skills. Vision-Language Models (VLMs) have emerged as a promising approach to automation. However, their further development and evaluation are hampered by the lack of a comprehensive benchmark that considers human perception and realistic editing complexity.

BlenderGym: A New Standard for Evaluating VLMs

With BlenderGym, the first comprehensive benchmark for VLM systems for 3D graphics editing has been introduced. This benchmark evaluates VLMs based on code-based 3D reconstruction tasks. The evaluation is performed by executing commands in the 3D graphics software Blender. This ensures that the VLMs not only generate images but also understand and manipulate the underlying 3D structures. BlenderGym thus enables a more realistic assessment of the capabilities of VLMs in the context of 3D graphics editing.

Challenges for Current VLMs

Initial evaluations with BlenderGym show that even state-of-the-art VLMs struggle with tasks that are relatively easy for human Blender users. This highlights the need for further research and development in this area. The results underscore the complexity of 3D graphics editing and the necessity for more robust and powerful VLMs.

Influence of Scaling Techniques on VLM Performance

BlenderGym enables the investigation of the influence of inference scaling techniques on the performance of VLMs in graphics editing tasks. Interestingly, the results show that the verifier used to control the scaling can itself be improved through inference scaling. This complements recent findings on the inference scaling of LLMs in programming and mathematics tasks. The distribution of computing power between generation and verification plays a crucial role in optimizing VLM performance.

Outlook and Significance for the Future of 3D Graphics Editing

BlenderGym provides a valuable foundation for the future development and evaluation of VLMs in the field of 3D graphics editing. The benchmark allows for an objective comparison of different VLM systems and helps to reveal the limitations of current technology. The insights gained from studies conducted with BlenderGym can contribute to advancing the development of more powerful VLMs and enabling the automation of complex 3D graphics editing tasks.

The development of tools like BlenderGym is crucial for progress in the field of AI-powered 3D graphics editing. By providing a standardized testing environment, researchers and developers can objectively evaluate the performance of their models and make more targeted improvements. This paves the way for innovative applications in various industries and could fundamentally change the way 3D graphics are created and edited. Companies like Mindverse, which specialize in the development of AI solutions, play an important role in shaping this future.

Bibliographie: Gu, Y., Huang, I., Je, J., Yang, G., & Guibas, L. (2025). BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://arxiv.org/abs/2504.01786 https://www.researchgate.net/publication/390440171_BlenderGym_Benchmarking_Foundational_Model_Systems_for_Graphics_Editing https://www.themoonlight.io/de/review/blendergym-benchmarking-foundational-model-systems-for-graphics-editing https://x.com/NaveenManwani17/status/1911025742646763618 https://chatpaper.com/chatpaper/fr/paper/126424 https://ianhuang.ai/ https://www.linkedin.com/posts/timothymporter_need-to-evaluate-new-computers-for-blender-activity-7316878502056591360-r3Xz https://www.researchgate.net/scientific-contributions/Karoly-Zsolnai-Feher-2141701518 https://cvpr.thecvf.com/Conferences/2025/AcceptedPapers https://jihyeon-je.github.io/publications/ ```