Automating Slide Design: AutoPresent and SlidesBench

Automated Slide Design: An Overview of AutoPresent and SlidesBench

The design of structured visual content, such as presentation slides, is essential for effective communication. It requires skills in both content creation and visual planning. This article examines automated slide generation, where AI models create presentations from natural language instructions. The focus is on the new benchmark dataset SlidesBench and the model AutoPresent, which is built upon it.

SlidesBench: A New Benchmark for Slide Generation

To quantify the performance of AI agents in slide generation, SlidesBench was developed – the first benchmark dataset for this purpose. It comprises 7,000 training and 585 test examples derived from 310 publicly available slide decks across ten different domains, including art, business, and technology. SlidesBench allows for both reference-based and reference-free evaluations. Reference-based metrics measure the similarity to a target slide, while reference-free metrics assess the design quality of generated slides based on design principles.

AutoPresent: An AI-Powered Approach to Slide Creation

AutoPresent is a system based on the 8B Llama model, trained on 7,000 pairs of instructions and corresponding code for slide generation. The approach is based on program generation: Starting from a natural language instruction, the model first generates a Python program, which is then executed to create the slide. This approach allows for precise control over all elements, including text content, images, layout, colors, and more. AutoPresent achieves results comparable to those of the closed-source model GPT-4o.

Iterative Refinement for Higher Quality

Iterative refinement plays a crucial role in improving slide quality. In this process, the model is prompted to refine its own output independently. This process leads to an increase in design quality and allows the model to learn from its mistakes and gradually optimize the results.

SlidesLib: Simplifying Program Generation

The generation of complex programs poses a challenge for current models. To simplify this process, SlidesLib was developed, a library with high-level functions for slide programming. These functions cover basic tasks such as adding titles, as well as image-related functions such as image search and generation. The use of SlidesLib facilitates program generation and improves the performance of LLMs and VLMs.

Comparison with Other Methods

Compared to end-to-end image generation methods like Stable Diffusion and Dall-E, program-driven approaches deliver higher quality slides in user-friendly formats. Smaller models like Llama (8B) and LlaVa (7B) often struggle to generate executable code. While GPT-4o produces usable slides, it still shows gaps in design quality compared to human-created slides.

Conclusion: Automated Slide Design with a Future

AutoPresent and SlidesBench represent an important step towards automated slide design. The ability to generate complete slides from natural language instructions opens up new possibilities for efficient content creation and presentation design. Future research could focus on improving the handling of complex diagrams and adapting to specific target audiences.