Single Large Language Model Outperforms Mixtures of Models

The Mixture is Key – Or Is It? New Findings on Combining Language Models

Combining the results of different sources is an established method for increasing performance in Artificial Intelligence. In the field of large language models (LLMs), the so-called Mixture-of-Agents (MoA) has become a popular ensemble method. It aggregates the outputs of several different LLMs to achieve an optimized result. A new study now questions the fundamental assumption of this approach: Is mixing different LLMs actually always beneficial?

Researchers have developed an alternative method called Self-MoA, which aggregates the outputs of only a single, most powerful LLM. Surprisingly, Self-MoA outperforms traditional MoA, which combines different LLMs, in many scenarios. For example, Self-MoA achieved a 6.6% improvement over MoA in the AlpacaEval 2.0 benchmark and an average of 3.8% across various benchmarks, including MMLU, CRUX, and MATH. Applying Self-MoA to one of the top models in AlpacaEval 2.0 even resulted in a new top score on the leaderboard.

Quality over Diversity?

To understand the effectiveness of Self-MoA, the researchers systematically investigated the relationship between diversity and quality of outputs under various MoA settings. The results show that the performance of MoA strongly depends on the quality of the individual models. Mixing different LLMs often leads to a reduction in average quality. However, the study also identifies scenarios where combining different LLMs can actually be helpful. For instance, combining models specialized in different tasks can be beneficial in certain contexts.

Self-MoA: The Potential of In-Model Diversity

Self-MoA utilizes so-called in-model diversity. Instead of combining different LLMs, Self-MoA generates multiple outputs from the same model and aggregates them. This approach allows for optimally leveraging the strengths of a single, powerful LLM without suffering the disadvantages of quality dilution caused by mixing different models.

Sequential Aggregation for More Efficiency

The study also introduces a sequential version of Self-MoA. This allows for the gradual aggregation of a large number of LLM outputs over several rounds and is just as effective as the simultaneous aggregation of all outputs. This approach offers advantages in terms of computing power and efficiency, especially when processing large amounts of data.

Outlook: Optimizing Ensemble Methods

The results of this study shed new light on the application of ensemble methods in the field of large language models. They show that the diversity of models does not necessarily lead to a performance increase and that the quality of the individual models plays a crucial role. The development of Self-MoA and the insights into sequential aggregation open up new possibilities for optimizing ensemble methods and increasing the performance of LLMs. Future research could focus on identifying the specific conditions under which the combination of different LLMs actually adds value and the development of adaptive ensemble methods that dynamically switch between Self-MoA and MoA, depending on the requirements of the respective task.

Bibliography: https://www.arxiv.org/abs/2502.00674 https://openreview.net/forum?id=ioprnwVrDH https://openreview.net/pdf/886ba7b85c749e8d72b55e1abf551408df22539b.pdf https://arxiv.org/abs/2406.04692 https://x.com/omarsar0/status/1886792384954163347 https://www.threads.net/@omarsar0/post/DFp7tz6MSK- https://neurips.cc/virtual/2024/workshop/84722 https://huggingface.co/papers/2406.04692 https://www.researchgate.net/publication/381294672_Mixture-of-Agents_Enhances_Large_Language_Model_Capabilities https://icml.cc/Downloads/2024 ```