Multiagent Finetuning Improves Large Language Models Through Collaborative Learning

Multiagent Fine-tuning: How Language Models Improve Themselves Through Collaborative Learning

Large language models (LLMs) have made remarkable progress in recent years. However, their performance is fundamentally limited by the underlying training data. To improve models beyond the training data, researchers have recently investigated how LLMs can generate synthetic data for autonomous self-improvement. However, successive steps of self-improvement quickly reach a point of diminishing returns.

A promising approach to overcome this limitation is called multiagent fine-tuning. This involves a group of language models, all derived from the same base model, specializing independently. Each model is updated with data generated through multiagent interactions within the model group. By training each model with independent datasets, this approach allows for specialization of individual models while simultaneously diversifying across the entire model group.

Specifically, the models interact in a "debate" scenario where they work on tasks and generate solutions. Some models act as "generators," creating proposed solutions, while others act as "critics," evaluating and correcting the proposals as needed. The data obtained from these interactions, including the corrections and evaluations, are then used to fine-tune the individual models. This process can be repeated over several rounds, with the models learning from each other in each round and thus iteratively improving.

The advantage of this approach lies in the diversity of the generated data. Through the interaction of different models, more diverse solutions and chains of reasoning emerge than would be the case with the self-improvement of a single model. This prevents the models from getting stuck in local optima and stagnating. The diversity of the models allows the overall system to maintain different lines of reasoning and to autonomously improve over many more fine-tuning rounds than single-agent self-improvement methods.

Studies have shown that multiagent fine-tuning significantly improves the performance of LLMs in a number of reasoning tasks, particularly in the mathematical domain. The results suggest that this approach offers great potential for the further development of LLMs and overcoming the limitations of existing training data. Future research could focus on applying this approach to more complex tasks and datasets, as well as investigating the scalability of the procedure.

For Mindverse, a German company that develops AI-powered content tools, multiagent fine-tuning offers exciting prospects. The technology could help to further improve the performance of the solutions offered by Mindverse, such as chatbots, voicebots, and AI search engines, and enable the development of even more powerful AI systems. By integrating multiagent fine-tuning, for example, chatbots could learn to conduct more complex dialogues and answer user queries more precisely. The development of AI systems capable of learning independently and adapting to new situations is also brought closer by this approach.

Advantages of Multiagent Fine-tuning:

The application of multiagent fine-tuning offers several advantages:

- Increased Performance: Multiagent fine-tuning leads to a measurable improvement in the performance of LLMs, especially in tasks that require logical thinking and reasoning. - Promotion of Diversity: The interaction of different models creates more diverse solution paths and chains of reasoning, which increases the creativity and flexibility of the models. - Continuous Self-Improvement: The approach enables autonomous and continuous improvement of the models over several training rounds. - Scalability: The procedure can potentially be scaled to larger and more complex tasks and datasets.

Outlook:

Multiagent fine-tuning represents a promising approach for the further development of LLMs. Future research will address the application of the procedure to various application areas and the development of more robust evaluation metrics. The integration of multiagent fine-tuning into AI-powered content tools, such as those offered by Mindverse, opens up new possibilities for the creation of high-quality and dynamic content.

Bibliographie: Subramaniam, V., Du, Y., Tenenbaum, J. B., Torralba, A., Li, S., & Mordatch, I. (2025). Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains. arXiv preprint arXiv:2501.05707. https://arxiv.org/abs/2501.05707 https://arxiv.org/pdf/2501.05707 https://paperreading.club/page?id=277641 https://openreview.net/forum?id=JtGPIZpOrz https://www.linkedin.com/posts/ju-seung-byun-1a76b01b9_excited-to-share-our-latest-work-accepted-activity-7245217231356207104-OkiT https://openreview.net/pdf/847b7f9c1c983ec1c763f6957c3a9965ed8eaa63.pdf https://www.researchgate.net/publication/381960577_Fine-Tuning_with_Divergent_Chains_of_Thought_Boosts_Reasoning_Through_Self-Correction_in_Language_Models https://github.com/WooooDyy/LLM-Agent-Paper-List https://www.researchgate.net/publication/384115481_Improving_LLM_Reasoning_with_Multi-Agent_Tree-of-Thought_Validator_Agent https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1244/final-projects/CorneliaWeinzierlSreethuSuraSugunaVarshiniVelury.pdf ```