Small Language Models Achieve High Math Reasoning Performance with Self-Evolved Deep Thinking

Smaller Language Models Master Mathematical Reasoning Thanks to Self-Developed "Deep Thinking"

Artificial intelligence (AI) is developing rapidly, and large language models (LLMs) have demonstrated impressive capabilities in various fields. A new research article titled "rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking" presents an innovative method that allows smaller language models (SLMs) to significantly improve their mathematical abilities without relying on distillation from larger models.

The Challenge of Mathematical Reasoning for AI

Mathematical reasoning poses a particular challenge for AI systems. It requires not only the understanding of mathematical concepts but also the ability to combine logical steps and solve complex problems. While large language models have made progress in this area, smaller models often fall short of expectations. The size of the models plays a crucial role in their performance, and the high computational cost of large models presents a hurdle for many applications.

rStar-Math: A New Method for "Deep Thinking"

The rStar-Math approach aims to improve the mathematical capabilities of smaller language models through a process of "deep thinking." At the heart of this method is Monte Carlo Tree Search (MCTS), an algorithm used in AI for strategic decision-making. In rStar-Math, a specially trained "policy" SLM is used to perform the MCTS search. This policy SLM is guided by another SLM-based "Process Reward Model" (PRM) that evaluates the quality of the search steps performed.

Innovations in rStar-Math

Three key innovations characterize the rStar-Math approach:

First, rStar-Math uses a novel method for data synthesis that combines code and Chain-of-Thought (CoT). Through extensive MCTS rollouts, step-by-step verified solution paths for mathematical problems are generated. This data then serves to train the policy SLM. Second, an innovative procedure is used to train the Process Reward Model. Instead of evaluating individual steps, the PRM learns to prefer entire solution paths. This leads to a more effective model that assesses the quality of the entire thinking process. Third, rStar-Math is based on a "self-evolution" principle. Both the policy SLM and the PRM are built from scratch and iteratively developed to continuously improve mathematical abilities.

Impressive Results

Through four rounds of this self-evolution with millions of synthesized solutions for over 700,000 mathematical problems, rStar-Math was able to significantly increase the performance of smaller language models. On the MATH benchmark, rStar-Math improved the accuracy of Qwen2.5-Math-7B from 58.8% to 90.0% and that of Phi3-mini-3.8B from 41.4% to 86.4%. With this, rStar-Math even surpassed the performance of OpenAI's o1-preview. rStar-Math also achieved remarkable results on the American Invitational Mathematics Examination (AIME), solving an average of 53.3% of the problems.

Outlook and Significance for AI Development

The results of rStar-Math are promising and demonstrate the potential of smaller language models to master complex mathematical reasoning. The approach of "deep thinking" through MCTS in combination with the innovative training methods opens up new possibilities for the development of more efficient and powerful AI systems. The research results could be particularly relevant for companies like Mindverse, which specialize in the development of customized AI solutions. The combination of rStar-Math with Mindverse's existing offerings, such as chatbots, voicebots, and AI search engines, could lead to even more powerful and efficient applications.

Bibliographie: https://huggingface.co/papers/2501.04519 https://huggingface.co/papers https://www.chatpaper.com/chatpaper/zh-CN?id=3&date=1736352000&page=1 https://arxiv.org/abs/2408.06195 https://www.nature.com/articles/s41586-023-06924-6 https://openreview.net/forum?id=6aHUmotXaw https://news.ycombinator.com/item?id=41808683 https://www.reddit.com/r/singularity/comments/1bf7va0/new_q_paper_doubles_llm_performance_in_mathematics/ https://www.ijcai.org/proceedings/2024/0381.pdf