Activation-Informed Merging Improves Large Language Model Performance

Improved Performance of Language Models through Activation-Informed Merging
Large Language Models (LLMs) have revolutionized the way we interact with information. Their ability to generate human-like text, solve complex tasks, and understand various languages opens up countless application possibilities. However, the continuous improvement of these models, both in terms of performance and efficiency, remains a central challenge. A promising approach to address this challenge is model merging.
Model merging combines the parameters and embeddings of multiple fine-tuned LLMs. This allows for bundling the strengths of different models, thereby increasing overall performance without excessively increasing computational costs. Traditional merging methods often focus on weighting model parameters but neglect the information contained within the models' activations.
New research introduces an innovative technique called Activation-Informed Merging (AIM). AIM integrates information from the activation space of LLMs into the merging process. The activations of a neural network reflect the reactions of individual neurons to the input data and thus offer valuable insights into the model's internal workings. By considering these activations, AIM aims to improve the robustness and performance of the merged models.
Unlike other approaches, AIM is designed as a flexible and complementary solution, compatible with existing merging methods. The method is based on principles of Continual Learning (CL) and model compression to preserve critical weights of the base model. Using a task-agnostic calibration set, AIM selectively prioritizes important weights during the merging process.
Empirical studies show that AIM significantly improves the performance of merged models across various benchmarks. The results suggest that considering activation information can lead to significant advancements in model merging strategies for LLMs, with benchmark performance increases of up to 40%. This opens up new possibilities for the development of more powerful and efficient language models.
The integration of activation information into the merging process represents a promising approach for optimizing LLMs. AIM offers a flexible and effective method for combining the strengths of different models while minimizing computational costs. Future research could focus on further refining AIM and exploring further applications of this technique.
Bibliographie: https://arxiv.org/abs/2502.02421 https://arxiv.org/html/2502.02421v1 http://paperreading.club/page?id=281946 https://github.com/FlagOpen/FlagEmbedding https://github.com/locuslab/massive-activations https://openreview.net/forum?id=Tr0lPx9woF https://www.bfdi.bund.de/SharedDocs/Downloads/DE/Berlin-Group/20241206-WP-LLMs.pdf?__blob=publicationFile&v=2 https://aclanthology.org/2024.lrec-main.593.pdf https://openreview.net/pdf?id=osoWxY8q2E