Fine-Tuning Retrievers for Multi-Task Retrieval Augmented Generation in Enterprise Settings

Retrieval-Augmented Generation in the Enterprise: A Versatile Approach to Fine-tuning Retrievers

Retrieval-Augmented Generation (RAG) has become an indispensable technique for utilizing large language models (LLMs). By incorporating external information sources, weaknesses of LLMs, such as generating hallucinations or outdated information, can be effectively addressed. However, practical challenges arise when developing real-world RAG applications. The retrieved information is often domain-specific. Since fine-tuning LLMs is computationally intensive, fine-tuning the retriever offers a way to improve the quality of the data provided to the LLM. Furthermore, with the increasing number of applications in a system, separate retrievers cannot be provided for each application. And these applications typically require different data types.

In this article, we present a versatile approach to fine-tuning a retriever for various domain-specific tasks. This allows a single encoder to be used for diverse use cases, which reduces costs, enables scalability, and increases speed. We demonstrate how this encoder generalizes to out-of-domain scenarios as well as to new retrieval tasks in real-world enterprise scenarios.

The Challenge of Domain Specificity and Scalability

LLMs like GPT-4 can handle various inputs and generate diverse text formats. Retrievers, on the other hand, must be small, fast, and performant with domain-specific data sources. Standard retrievers provide good results with open-source benchmarks but do not necessarily generalize to real-world data, especially when it is structured and originates from existing databases.

Another challenge is scalability and generalization across various GenAI applications that rely on retrieval. LLMs generalize to a variety of tasks due to extensive pre-training data and instruction fine-tuning. However, if the retriever is not performant and fast across different retrieval tasks, the downstream generation is negatively impacted.

Multi-Task Fine-tuning as a Solution

Our approach is to fine-tune a small retriever encoder with instructions on various domain-specific tasks. This retriever is deployed in an ecosystem of GenAI applications that prompt it to retrieve desired structured data from databases. Information such as workflow step names, table names, and field names is then passed to LLMs to generate workflows or playbooks, resulting in higher output quality.

For our example, we use mGTE due to its large context length (8,192 tokens), which allows it to receive long instructions, and its multilingual capabilities. We compare it with BM25 and the multilingual embedding models mE5 and the standard mGTE.

Evaluation and Results

We conduct several evaluations. First, we evaluate the tasks on which the retriever was trained, but in out-of-domain (OOD) scenarios. The internal training datasets are from the IT domain, the OOD splits from various domains such as HR and Finance. Then we evaluate a related but different retrieval task to test the model's generalization ability. Finally, we check if the multilingual capabilities of mGTE are retained after our multi-task fine-tuning by evaluating inputs from different languages, although our training dataset is only in English.

Conclusion

This approach demonstrates how a domain-specific and efficient retriever for real-world RAG applications can be developed. The multi-task fine-tuning of the retriever allows generalization to out-of-domain datasets and related, but different retrieval tasks. This enables the use of a single retriever for various use cases, leading to cost savings, scalability, and increased speed.

Bibliography Béchard, P., & Ayala, O. M. (2024). Flow Generation. Fan, et al. (2024). Gao et al. (2024). OpenAI et al. (2024). GPT-4. Robertson, S. E., & Walker, S. (1994). Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 232–241). Wei et al. (2022). Zhang et al. (2024a). Ouyang et al. (2022). Zhang et al. (2024c). mGTE. Siriwardhana et al. (2023). Barron et al. (2024). Zhang et al. (2024b). Retrieval Augmented Fine Tuning. Maillard et al. (2021). Wang et al. (2022). Asai et al. (2023). Su et al. (2022). BehnamGhader et al. (2024). Feng et al. (2020). Li et al. (2022). Guo et al. (2020). Neelakantan et al. (2022). Zhang* et al. (2024). Poesia et al. (2022). Synchromesh. Li et al. (2023). SANTA. https://arxiv.org/abs/2501.04652 https://arxiv.org/html/2501.04652v1 https://x.com/_reachsumit/status/1877195251493781873 https://aclanthology.org/2024.findings-emnlp.41.pdf https://www.nature.com/articles/s41598-024-79110-x https://medium.com/hackerai/fine-tuning-or-retrieval-augmented-generation-rag-that-is-the-question-b5b0226d7ca8 https://training.continuumlabs.ai/knowledge/retrieval-augmented-generation/raft-adapting-language-model-to-domain-specific-rag https://www.unite.ai/raft-a-fine-tuning-and-rag-approach-to-domain-specific-question-answering/ https://www.ijfmr.com/papers/2024/5/22581.pdf https://www.researchgate.net/publication/367403483_Improving_the_Domain_Adaptation_of_Retrieval_Augmented_Generation_RAG_Models_for_Open_Domain_Question_Answering ```