Preference Leakage: A Data Contamination Problem in Large Language Models

Data Contamination in Large Language Models: The Problem of "Preference Leakage"
Large language models (LLMs) have revolutionized the way we interact with information. They are used in a variety of applications, from text generation and translation to answering questions. A particularly promising area of application is the use of LLMs as an evaluation instance ("LLM-as-a-judge") for other LLMs, especially in combination with LLM-based data synthesis. This combination allows for efficient model development and evaluation. But does this new methodology also carry risks? Current research suggests that the combination of LLM-generated data and LLM-based evaluation can lead to a previously overlooked problem: so-called "Preference Leakage".
What is Preference Leakage?
Preference Leakage describes a contamination problem that arises when the LLM that generates the data is related to the LLM that subsequently evaluates this data. This relationship can be of different nature. The models can be identical, one model can be derived from the other (inheritance relationship), or both can belong to the same model family. In all these cases, there is a risk that the evaluating LLM develops an implicit preference for the data generated by the related model. The result is a biased evaluation that does not accurately reflect the actual performance of the model being evaluated.
Impacts and Challenges
Studies have shown that Preference Leakage leads to a systematic preference for related models. This bias occurs across different LLM base models and benchmarks and makes objective performance evaluation difficult. Particularly problematic is the fact that Preference Leakage is more difficult to detect compared to other known biases in LLM-based evaluation. This makes the problem a widespread and serious challenge for the development and deployment of LLMs.
The Importance of Research
Research into Preference Leakage is crucial for the future development and application of LLMs. Only through a comprehensive understanding of the causes and effects of this problem can we develop effective strategies for prevention and mitigation. The development of robust and reliable evaluation methods is essential to unlock the full potential of LLMs and enable their application in critical areas such as medicine or finance.
Future Research and Solutions
Research on Preference Leakage is still in its early stages. Future studies should focus on the development of methods for detecting and quantifying Preference Leakage. Furthermore, research into strategies for mitigating the problem is crucial. Possible approaches could include the development of independent evaluation metrics, the use of diversified training data, and improving the transparency of LLM development processes.
Conclusion
Preference Leakage represents a serious challenge for the development and application of LLMs. Further research into this problem is essential to ensure the reliability and objectivity of LLM-based evaluation systems and to realize the full potential of this technology. By developing robust evaluation methods and implementing appropriate strategies to mitigate Preference Leakage, we can ensure that LLMs take their place as trustworthy and powerful tools in a variety of applications.
Bibliography: https://arxiv.org/html/2408.08808v2 https://arxiv.org/pdf/2408.08808 https://aclanthology.org/2024.customnlp4u-1.14.pdf https://github.com/lyy1994/awesome-data-contamination https://aclanthology.org/2023.findings-emnlp.722.pdf https://files.sri.inf.ethz.ch/website/papers/dekoninck2024evading.pdf https://www.researchgate.net/publication/376393345_NLP_Evaluation_in_trouble_On_the_Need_to_Measure_LLM_Data_Contamination_for_each_Benchmark https://researchportal.hw.ac.uk/files/142022227/2406.18403v1.pdf https://openreview.net/forum?id=7visV100Ms¬eId=6vVcxpCHKW https://www.wsdm-conference.org/2025/2025-wsdm-cup-lmsys-multilingual-chatbot-arena/ ```