ReTool Enhances Large Language Models with Tool Use for Complex Math Problem Solving

Artificial Intelligence Masters Complex Mathematical Problems: ReTool Optimizes Tool Usage in Large Language Models

Large language models (LLMs) have made impressive progress in text processing and logical reasoning in recent years. Models like DeepSeek R1, trained with Reinforcement Learning (RL), excel in text-based inferences. However, they reach their limits with tasks that require structured problem-solving strategies, such as geometric reasoning, precise calculations, or solving complex equations. In these areas, computational tools like Code Interpreter (CI) show clear advantages.

To bridge this gap, ReTool was developed. This innovative system extends the logical reasoning of LLMs by integrating tools. Two core functions distinguish ReTool: First, the dynamic interlinking of real-time code execution within natural language reasoning processes, and second, an automated RL paradigm. This enables policy rollouts with multi-step code execution in real-time and teaches the model when and how to use tools based on outcome feedback.

The training of ReTool takes place in a systematic framework. Initially, synthetic cold-start data is generated to create code-augmented traces of logical inferences, which serve to fine-tune base models. The subsequent RL training uses task outcomes as rewards to iteratively refine the model's tool usage strategy. This allows the model to independently discover optimal patterns for tool use without requiring human intervention.

Experiments with the demanding MATH Olympiad Benchmark AIME demonstrate ReTool's performance. A 32B model achieved an accuracy of 67% with only 400 training steps, surpassing the text-based RL baseline (40% accuracy, 1080 steps) in both efficiency and performance. In extended settings, ReTool-32B even achieved an accuracy of 72.5%, significantly outperforming comparable models.

Further analyses reveal emergent behaviors such as self-correction of code, suggesting an "aha moment" where the model independently masters adaptive tool use. These results highlight the potential of outcome-oriented tool integration for the advancement of complex mathematical reasoning and offer new insights into hybrid neuro-symbolic systems. The dynamic integration of code interpreters into LLMs opens up new possibilities for solving complex problems that were previously inaccessible to pure text processing. The ability to execute code in real-time and incorporate the results into the reasoning process allows the model to draw conclusions at a higher level and develop solution strategies that go beyond the capabilities of purely text-based systems.

The automated RL method allows ReTool to continuously optimize its tool usage strategy and adapt to new challenges. By learning from the feedback of task outcomes, the model can independently discover the most effective ways to utilize the available tools. This approach reduces the need for manual intervention and enables more efficient development of AI systems for complex problem-solving tasks.

Bibliography: Feng, J., Huang, S., Qu, X., Zhang, G., Qin, Y., Zhong, B., Jiang, C., Chi, J., & Zhong, W. (2025). ReTool: Reinforcement Learning for Strategic Tool Use in LLMs. arXiv preprint arXiv:2504.11536. PaperReading. (n.d.). ReTool: Reinforcement Learning for Strategic Tool Use in LLMs. Retrieved from https://paperreading.club/page?id=299932 Wang, S. (n.d.). Reinforcement Learning Enhanced LLMs: A Survey. Retrieved from https://github.com/ShuheWang1998/Reinforcement-Learning-Enhanced-LLMs-A-Survey Atos. (2024). Retrieval Augmented Generation AI. Ahmed, F. (n.d.). LinkedIn Profile. Retrieved from https://www.linkedin.com/in/faiz-ahmed