Automating Slide Design: AutoPresent and SlidesBench
Top post
Automated Slide Design: An Overview of AutoPresent and SlidesBench
The design of structured visual content, such as presentation slides, is essential for effective communication. It requires skills in both content creation and visual planning. This article examines automated slide generation, where AI models create presentations from natural language instructions. The focus is on the new benchmark dataset SlidesBench and the model AutoPresent, which is built upon it.
SlidesBench: A New Benchmark for Slide Generation
To quantify the performance of AI agents in slide generation, SlidesBench was developed – the first benchmark dataset for this purpose. It comprises 7,000 training and 585 test examples derived from 310 publicly available slide decks across ten different domains, including art, business, and technology. SlidesBench allows for both reference-based and reference-free evaluations. Reference-based metrics measure the similarity to a target slide, while reference-free metrics assess the design quality of generated slides based on design principles.
AutoPresent: An AI-Powered Approach to Slide Creation
AutoPresent is a system based on the 8B Llama model, trained on 7,000 pairs of instructions and corresponding code for slide generation. The approach is based on program generation: Starting from a natural language instruction, the model first generates a Python program, which is then executed to create the slide. This approach allows for precise control over all elements, including text content, images, layout, colors, and more. AutoPresent achieves results comparable to those of the closed-source model GPT-4o.
Iterative Refinement for Higher Quality
Iterative refinement plays a crucial role in improving slide quality. In this process, the model is prompted to refine its own output independently. This process leads to an increase in design quality and allows the model to learn from its mistakes and gradually optimize the results.
SlidesLib: Simplifying Program Generation
The generation of complex programs poses a challenge for current models. To simplify this process, SlidesLib was developed, a library with high-level functions for slide programming. These functions cover basic tasks such as adding titles, as well as image-related functions such as image search and generation. The use of SlidesLib facilitates program generation and improves the performance of LLMs and VLMs.
Comparison with Other Methods
Compared to end-to-end image generation methods like Stable Diffusion and Dall-E, program-driven approaches deliver higher quality slides in user-friendly formats. Smaller models like Llama (8B) and LlaVa (7B) often struggle to generate executable code. While GPT-4o produces usable slides, it still shows gaps in design quality compared to human-created slides.
Conclusion: Automated Slide Design with a Future
AutoPresent and SlidesBench represent an important step towards automated slide design. The ability to generate complete slides from natural language instructions opens up new possibilities for efficient content creation and presentation design. Future research could focus on improving the handling of complex diagrams and adapting to specific target audiences.