Harmonizing Diverse Models: A Layer-wise Merging Strategy for Consistent Generation
Peng, Kumar, Wu et al.
Retrieval-Augmented Generation (RAG) systems leverage Large Language Models (LLMs) to generate accurate and reliable responses that are grounded in retrieved context. However, LLMs often generate inconsistent outputs for semantically equivalent inputs, a problem compounded by the scarcity of consistency-focused training data and the limitations of current fine-tuning techniques in enhancing output consistency. We propose a new approach combining systematic synthetic data generation, triplet loss for better embeddings, and a novel layer-wise model merging approach. Using consistency-aware weights derived from intermediate layer activations, our method effectively integrates knowledge from specialized models. Experimental results how that our merged model significantly enhances output consistency, achieving a ~47.5\% improvement in response similarity over the baseline, thus offering a practical solution for increasing the reliability of an industrial RAG system.
academic
Harmonizing Diverse Models: A Layer-wise Merging Strategy for Consistent Generation
Retrieval-Augmented Generation (RAG) systems leverage Large Language Models (LLMs) to generate accurate and reliable responses based on retrieved context. However, LLMs frequently produce inconsistent outputs when faced with semantically equivalent inputs, a problem exacerbated by the scarcity of consistency-oriented training data and the limitations of current fine-tuning techniques in enhancing output consistency. This paper proposes a method combining systematic synthetic data generation, triplet loss, and a novel layer-wise model merging approach. By employing consistency-aware weights derived from intermediate layer activations, the method effectively integrates knowledge from specialized models. Experimental results demonstrate that the merged model significantly improves output consistency, achieving a 47.5% improvement in response similarity compared to the baseline.
The core problem addressed in this research is the output consistency issue in RAG system generation models, manifested as:
Semantically equivalent queries producing different responses: As shown in Figure 1, merely the presence or absence of a question mark can lead to RAG systems providing entirely different answers
Practical challenges in industrial deployment: In production environments, diverse query variants pose threats to system reliability
Given an original query Q and its semantically equivalent variant Q', the objective is to enable the RAG system's generator to produce consistent responses S and S' for both, i.e., maximize semantic similarity between S and S' while maintaining response accuracy.
The paper cites multiple important related works, including:
Lewis et al. (2020): Foundational work on RAG framework
Yu et al. (2024), Yadav et al. (2023): DARE-TIES model merging methods
Schroff et al. (2015): Original triplet loss work
Patwardhan et al. (2024): LLM consistency definitions and analysis
Overall Assessment: This is a high-quality applied research paper addressing practical industrial problems, with significant contributions in both methodological innovation and practical value. While there remains room for improvement in theoretical depth and generalization validation, the problem it addresses has important practical significance, and the proposed method demonstrates good operability and effectiveness.