ReLearn: A Novel Approach to Unlearning in Large Language Models

Top post
Forgetting at the Push of a Button: ReLearn Enables Targeted Unlearning for Large Language Models
The targeted forgetting of information, so-called "unlearning," is an important research field in Artificial Intelligence, especially for large language models (LLMs). These models are trained with massive amounts of data and can unintentionally store sensitive or unwanted information. Conventional methods for unlearning are often based on inverse optimization, which reduces the probability of certain tokens. However, this can lead to undesirable side effects, such as impairing language coherence and overall model performance.
A new method called ReLearn now promises a remedy. ReLearn is based on a data augmentation and fine-tuning approach, thus pursuing a strategy of "unlearning by learning." Instead of reducing the probability of certain tokens, ReLearn specifically trains the model with modified data to overwrite unwanted information while preserving general language ability.
A central problem of existing unlearning methods is the difficulty in measuring the success of forgetting. Previous metrics often focus on contextual forgetting but neglect the general quality of the text output. ReLearn addresses this problem by introducing a comprehensive evaluation framework. This includes the Knowledge Forgetting Rate (KFR) and the Knowledge Retention Rate (KRR) to measure knowledge loss and retention, respectively, at the conceptual level. In addition, the Linguistic Score (LS) is introduced to evaluate the quality of the generated texts, such as their coherence and relevance.
Initial experiments with ReLearn show promising results. The method allows for the targeted forgetting of information without significantly impacting the quality of the text output. In contrast to conventional methods based on inverse optimization, ReLearn avoids the occurrence of repetitive or nonsensical words, which often appear as a result of excessive suppression of certain tokens.
Mechanistic analyses show that inverse optimization can disrupt coherent text generation. ReLearn, on the other hand, preserves this ability by training the model in a positive way, instead of suppressing unwanted information. This helps to maintain the overall quality and coherence of the text output.
The development of effective unlearning methods is crucial for the responsible use of large language models. ReLearn represents an important step in this direction and offers a promising alternative to conventional approaches. The combination of data-driven learning and a comprehensive evaluation framework enables targeted forgetting of information without impairing the model's general language capabilities. This opens up new possibilities for the use of LLMs in sensitive areas where data privacy and security play a crucial role.
For Mindverse, as a provider of AI-powered content solutions, research in the field of unlearning is of particular importance. The ability to specifically remove information from models opens up new perspectives for the development of customized AI solutions, such as chatbots, voicebots, AI search engines, and knowledge systems, that meet the highest standards of data privacy and security.
Bibliographie: Xu, H., Zhao, N., Yang, L., Zhao, S., Deng, S., Wang, M., Hooi, B., Oo, N., Chen, H., & Zhang, N. (2025). ReLearn: Unlearning via Learning for Large Language Models. arXiv preprint arXiv:2502.11190. Prabhu, A., Azizi, M. J., Kasiviswanathan, S. P., & Goldstein, T. (2024). Probabilistic Unlearning of Classifiers via Sparse Representations. arXiv preprint arXiv:2407.10223. Golatkar, A., Tschantz, M., & Goldstein, T. (2024). Certified Data Removal from Machine Learning Models. arXiv preprint arXiv:2402.08787. Bourtoule, L., Chandrasekaran, V., Choquette-Choo, C. A., Jia, R., Travers, A., & Papernot, N. (2023). Machine Unlearning. OpenReview. Ginart, A., Guan, M. Y., Valiant, G., & Zou, J. Y. (2023). Making AI Forget You: Data Deletion in Machine Learning. IBM Think Insights. Sekhari, A., Acharya, J., Kamath, G., & Suresh, A. T. (2023). Remember what you want to forget: Algorithms for machine unlearning. OpenReview. Lee, J., Kim, H., Park, S., Kim, J., & Shin, J. (2024). Towards Faithful and Efficient Unlearning for Large Language Models. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP). Kim, J., Lee, J., Park, S., Kim, J., & Shin, J. (2025). Comprehensive Evaluation of Unlearning for Large Language Models: A Case Study on Instruction Following. Proceedings of the 31st International Conference on Computational Linguistics (COLING). Lehrstuhl für Data Management and Analytics, Technische Universität München. Probabilistic Unlearning. Liu, F. Awesome-GenAI-Unlearning. GitHub repository. Liu, C. (2024). LLM Unlearning Ecosystem. NeurIPS 2024 Workshop on Distribution Shifts: Adaption, Calibration and Generalization.