Post Training Scaling Improves Automated Theorem Proving
Advances in Automated Theorem Proving through Post-Training Scaling
The field of Automated Theorem Proving (ATP) has made significant progress in recent years through the use of Large Language Models (LLMs). In particular, the ability to work with formal languages like Lean 4 opens up new possibilities for automating complex logical reasoning. Despite these advances, the recent development of post-training scaling, as demonstrated in models like OpenAI's O1/O3 and Deepseek R1, has not yet fundamentally changed the field of ATP. A current research project, Leanabell-Prover, is dedicated to precisely this challenge and investigates how the principles of post-training can be applied to ATP.
The goal of Leanabell-Prover is to transfer the advances in natural language reasoning models to formal reasoning. The approach is based on two central pillars: Continuous Training and Reinforcement Learning. In the first step, existing ATP models are further trained with a hybrid dataset. This dataset consists of a variety of statement-proof pairs and additional data aimed at integrating cognitive behaviors that mimic human thinking and hypothesis refinement.
In the second step, Reinforcement Learning is used. Here, the Lean 4 compiler is used to obtain feedback on the correctness of the generated proofs. This feedback serves as a reward signal for the model, allowing it to learn to optimize its proving strategies. By combining continuous training and reinforcement learning, the researchers have already achieved significant improvements in existing formal proof assistants, including DeepSeek-Prover-v1.5 and Goedel-Prover. For example, a success rate of 59.8% (pass@32) was achieved on the MiniF2F benchmark, representing a new state of the art.
The Importance of Lean 4
The choice of Lean 4 as the formal language plays a crucial role in the Leanabell-Prover project. Lean 4 is a powerful proof assistant system characterized by its expressive type theory and efficient implementation. These properties make Lean 4 an ideal platform for the development and evaluation of new ATP methods. The integration of Lean 4 allows the model to generate and verify formal proofs that meet the highest standards of mathematical rigor.
Outlook and Future Developments
Leanabell-Prover is an ongoing project, and the researchers are continuously working on further improvements. Future work could focus on expanding the training dataset, developing new reinforcement learning algorithms, and investigating other formal languages. The results of the project have the potential to advance the development of more powerful ATP systems and open up new application possibilities in areas such as software verification, mathematical research, and artificial intelligence.
The team behind Leanabell-Prover plans to progressively publish their results, data, and training details and make them accessible to the community. This will allow other researchers to build upon the results and contribute to the further development of the field.
Bibliography: - https://arxiv.org/abs/2504.06122 - https://arxiv.org/html/2504.06122v1 - https://paperreading.club/page?id=298051 - https://chatpaper.com/chatpaper/zh-CN/paper/127641 - https://chatpaper.com/chatpaper/?id=2&date=1744128000&page=1 - https://paperreading.club/category?cate=Reinforcement_Learning