2025-11-25T14:25:18.089963

FedLoRA-Optimizer: Federated LoRA Fine-Tuning with Global and Local Optimization in Heterogeneous Data Scenarios

Zhao, Zhu, Zhang et al.

Federated efficient fine-tuning has emerged as an approach that leverages distributed data and computational resources across nodes to address the challenges of large-scale fine-tuning and privacy preservation. The Low-Rank Adaptation (LoRA) enables efficient fine-tuning of large-scale pre-trained models by introducing trainable low-rank matrices into weight updates.However, in heterogeneous data scenarios, client drift weakens the generalization of the global model, and local models often fail to meet the personalized needs of individual clients.Moreover, existing federated LoRA efficient fine-tuning techniques overlook fine-grained analysis of the tuning matrices. To address this, we conducted preliminary experiments and found that different LoRA matrices exhibit different sensitivity to changes in the direction and magnitude of their vectors.We thus propose a fine-grained federated LoRA tuning method. By fine-tuning the more sensitive directional vectors in the A matrix, which encode shared knowledge, our method learns shared features more effectively across clients and enhances global generalization. Simultaneously, by fine-tuning the more sensitive magnitude vectors in the B matrix, which encode personalized knowledge, our method better captures personalized knowledge, enabling detailed adaptation to local data. The method uses a pipeline combining global and local optimizers. Global optimization further improves local models, achieving collaborative optimization between global and local levels. This improves both the generalization ability of the global model and the personalized adaptation of local models under heterogeneous data scenarios. Experiments on Databricks-Dolly-15k and Natural Instructions with LLaMA2-7B and Deepseek-7B confirm that our method improves global performance by 0.39% and local performance by 0.59%.

academic

FedLoRA-Optimizer: Federated LoRA Fine-Tuning with Global and Local Optimization in Heterogeneous Data Scenarios

Basic Information

Paper ID: 2510.11274
Title: FedLoRA-Optimizer: Federated LoRA Fine-Tuning with Global and Local Optimization in Heterogeneous Data Scenarios
Authors: Jianzhe Zhao, Hailin Zhu, Yu Zhang, Ziqi Chen, Guibing Guo (Northeastern University)
Category: cs.LG (Machine Learning)
Publication Date: October 13, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.11274

Abstract

Federated efficient fine-tuning addresses the challenges of large-scale fine-tuning and privacy preservation by leveraging distributed data and computational resources across nodes. Low-Rank Adaptation (LoRA) enables efficient fine-tuning of large pre-trained models by introducing trainable low-rank matrices in weight updates. However, in heterogeneous data scenarios, client drift weakens the generalization capability of the global model, while local models often fail to meet the personalization requirements of individual clients. Furthermore, existing federated LoRA efficient fine-tuning techniques overlook fine-grained analysis of tuning matrices. To address this, we conducted preliminary experiments revealing that different LoRA matrices exhibit varying sensitivities to directional and magnitude changes in their vectors. Based on this finding, we propose a fine-grained federated LoRA tuning method that more effectively learns cross-client shared features by fine-tuning more sensitive directional vectors encoding shared knowledge in matrix A, enhancing global generalization capability; while simultaneously capturing personalized knowledge by fine-tuning more sensitive magnitude vectors encoding personalized knowledge in matrix B. This method employs a pipeline architecture combining global and local optimizers, improving both the generalization capability of the global model and the personalization adaptation of local models in heterogeneous data scenarios.

Research Background and Motivation

Research Problems

The core problems addressed in this paper are the inefficiencies in federated LoRA fine-tuning under heterogeneous data environments, specifically including:

Client Drift Problem: In federated learning environments with data heterogeneity, differences in data distribution across clients lead to degraded generalization capability of the global model
Insufficient Personalization: Local models fail to adequately satisfy the personalization requirements of individual clients
Lack of Fine-Grained Analysis: Existing methods overlook detailed analysis of LoRA tuning matrices

Problem Significance

With the widespread application of large pre-trained models, efficient distributed fine-tuning while preserving privacy has become a critical challenge. Federated learning provides a solution, but faces performance degradation in heterogeneous data scenarios, directly affecting the effectiveness of large models in practical applications.

Limitations of Existing Methods

Traditional Federated Learning Methods: Such as FedAvg, which face convergence difficulties and accuracy decline under data heterogeneity
Existing Federated LoRA Methods: Primarily focus on model architecture design, lacking fine-grained analysis of tuning matrix changes
Parameter Efficiency Methods: While reducing communication costs, the balance between global generalization and personalization adaptation remains difficult in heterogeneous environments

Research Motivation

The authors discovered through experiments that LoRA's matrix A and matrix B exhibit different sensitivity patterns in directional and magnitude changes, providing a theoretical foundation for designing targeted optimization strategies.

Core Contributions

Fine-Grained Empirical Analysis: First-time fine-grained analysis of directional and magnitude changes in LoRA tuning matrices, discovering that directional changes in matrix A are approximately 1.7 times those of matrix B, while magnitude changes in matrix B are approximately 41 times those of matrix A
Fine-Grained Federated Fine-Tuning Method for Heterogeneous Data: Proposes a method that separately optimizes high-sensitivity directional vectors in matrix A and high-sensitivity magnitude vectors in matrix B, significantly enhancing both the generalization capability of the global model and the adaptability of local models
Global-Local Collaborative Optimization Architecture: Designs a pipeline architecture combining global and local optimizers, achieving collaborative optimization at both global and local levels
Experimental Validation: Verification on LLaMA2-7B and DeepSeek-7B models using Databricks-Dolly-15k and Natural Instructions datasets, with global task accuracy improvement of approximately 0.39% and local task improvement of approximately 0.59%

Methodology Details

Task Definition

This paper investigates efficient fine-tuning of large language models in federated learning environments. Given N clients, each client i possesses a local dataset Di, the objective is to train a model that exhibits both good global generalization capability and satisfies the personalization requirements of individual clients without sharing raw data.

Key Observations and Findings

Through experimental analysis on the LLaMA2-7B model, the authors discovered two important observations:

Observation 1: Directional changes in matrix A are approximately 1.7 times those of matrix B

Matrix A primarily encodes cross-task shared knowledge, serving as the "foundational framework" of global knowledge
Changes in directional vectors directly impact training performance on global tasks

Observation 2: Magnitude changes in matrix B are approximately 41 times those of matrix A

Matrix B primarily encodes task-specific personalized information
Changes in magnitude vectors play a key role in downstream task training effectiveness

Model Architecture

Matrix Decomposition Strategy

Drawing inspiration from DoRA, LoRA matrices are decomposed into directional and magnitude components:

A = AM · AD, B = BM · BD

where AM, BM represent magnitude vectors, and AD, BD represent directional vectors.

Global Optimizer

Objective: Enhance global model generalization capability
Strategy: Focus on adjusting directional vectors of matrix A

Federated aggregation formula:

ĀD = (1/N) ∑(i=1 to N) AD,i
ĀM = (1/N) ∑(i=1 to N) AM,i  
B̄M = (1/N) ∑(i=1 to N) BM,i
B̄D = (1/N) ∑(i=1 to N) BD,i

Global model update:

Wg = W0 + B̄M · B̄D · ĀM · (ĀD + ΔAD,g)

Local Optimizer

Objective: Improve personalized model performance
Strategy: Focus on adjusting magnitude vectors of matrix B

Local model update:

Wl = Wg + (B̄'M + ΔB'M,l) · B̄'D · Ā'M · Ā'D

Local loss function:

Llocal = Ltask(Wlx,y) + (λ/2)||ΔMl||²F

Gradient update formula:

∇ΔMlocalLlocal = B̄'D · Ā'M · Ā'D · ∇ypredLtask + λ · ΔMlocal

Technical Innovations

Sensitivity-Based Differentiated Optimization: Employs targeted optimization strategies based on the different sensitivities of matrices A and B to directional and magnitude changes
Pipeline Architecture Design: Global optimizer first trains the global model, then local optimizer performs personalization fine-tuning based on the global model
Fine-Grained Parameter Control: Separately controls updates to directional and magnitude vectors, achieving more refined parameter tuning

Experimental Setup

Datasets

Databricks-Dolly-15k: Instruction fine-tuning dataset with multiple downstream tasks
Natural Instructions: Natural instruction dataset
Task Types: Three representative tasks selected to simulate heterogeneous environments
- Causal Reasoning (Causal)
- Question Answering (QA)
- Information Extraction (IE)
Data Split: 80% training set, 20% test set

Evaluation Metrics

Accuracy: Measures answer correctness through semantic similarity between model output and target response
Global Performance: Performance on all task combinations (ALL)
Local Performance: Performance on individual specific tasks

Comparison Methods

LoRA: Standard LoRA algorithm, training only adapter parameters
Prompt Tuning: Lightweight fine-tuning technique based on prompts
Adapt Tuning: Another parameter-efficient fine-tuning method

Implementation Details

Models: LLaMA2-7B, DeepSeek-7B
LoRA Parameters: rank=8, scaling factor=32, dropout=0.1
Application Layers: Applied only to Q and V sub-layers of self-attention
Hardware: A800 Linux server, 100GB RAM, 14-core Intel Xeon Gold 6348 CPU

Experimental Results

Main Results

LLaMA2-7B Results

Natural Instructions Dataset:

PH task: 11.62% vs LoRA's 11.46%
QA task: 66.69% vs LoRA's 61.69%
IE task: 21.18% vs LoRA's 22.85%
ALL task: 32.44% vs LoRA's 33.04%
Overall accuracy improvement of 0.73%

Databricks-Dolly-15k Dataset:

Causal task: 18.99% vs LoRA's 18.59%
QA task: 40.57% vs LoRA's 40.48%
IE task: 27.91% vs LoRA's 25.91%
ALL task: 26.20% vs LoRA's 25.70%
Overall accuracy improvement of 0.75%

DeepSeek-7B Results

Natural Instructions Dataset:

Overall improvement of 1.11%, from 6.00% to 6.44%

Databricks-Dolly-15k Dataset:

Overall improvement of 0.53%, from 18.90% to 20.10%

Parameter Analysis

Analysis of different rank settings reveals that the model achieves optimal performance when r=8, n=2, with 18.59% accuracy on causal reasoning tasks.

Ablation Study

Pipeline Structure Effectiveness Validation:

Compared the pipeline structure of "global optimization + local optimization" with methods using only local optimization
Experimental results show the pipeline mode outperforms non-pipeline mode on all three tasks (Causal, IE, QA)
Validates the effectiveness of staged training strategy

Experimental Findings

Differentiated Sensitivity of Direction vs. Magnitude Verified: Directional changes in matrix A are indeed approximately 1.7 times those of matrix B, while magnitude changes in matrix B are approximately 41 times those of matrix A
Necessity of Pipeline Architecture: Global optimization followed by local optimization performs better than direct local optimization
Importance of Parameter Settings: Appropriate rank settings have significant impact on performance

Parameter-Efficient Fine-Tuning

Adapters: Inserts small trainable modules in Transformer layers
LoRA: Decomposes weight matrices into low-rank components, updating only bypass modules
DoRA: Further decomposes LoRA matrices into "magnitude + direction"
Prompt Tuning: Guides models through carefully designed text prompts

Federated Learning

FedAvg: Performs global optimization through averaging updates, but performs poorly under data heterogeneity
FedProx: Adds proximal term to constrain local update deviation
SCAFFOLD: Uses control variates to correct "client drift"
Personalized Federated Learning: Constructs customized client models

Parameter-Efficient Federated Fine-Tuning

FFA-LoRA: Fixes one low-rank matrix while fine-tuning another to improve stability
Zeroth-Order Optimization Methods: Enables federated fine-tuning of large models through shared random seeds

Conclusions and Discussion

Main Conclusions

Value of Fine-Grained Analysis: Fine-grained analysis of directional and magnitude changes in LoRA matrices reveals important sensitivity difference patterns
Effectiveness of Differentiated Optimization Strategy: Differentiated optimization strategies targeting directional vectors of matrix A and magnitude vectors of matrix B can simultaneously improve both global generalization and local personalization capabilities
Advantages of Pipeline Architecture: Global-local collaborative optimization is more effective than pure local optimization

Limitations

Limited Performance Improvement: While effective, overall performance improvement is relatively modest (0.39%-0.59%)
Computational Complexity: Pipeline architecture increases training computational complexity
Limited Applicability Scope: Primarily validated on large language models; generalization to other model types requires further verification
Dependence on Heterogeneity Degree: Method effectiveness may depend on the degree of data heterogeneity

Future Directions

The authors propose future exploration of optimization strategies to improve model adaptability and fine-tuning efficiency in heterogeneous environments, including:

Further optimization of global-local collaborative mechanisms
Exploration of more efficient parameter decomposition and aggregation strategies
Extension to more model types and tasks

In-Depth Evaluation

Strengths

Innovative Theoretical Insights: First-time fine-grained analysis of LoRA matrix sensitivity differences from a granular perspective, providing theoretical foundation for optimization strategies
Reasonable Method Design: Differentiated optimization strategies designed based on empirical observations demonstrate strong rationality
Comprehensive Experimental Design: Includes sufficient comparative experiments, parameter analysis, and ablation studies
Clear Problem Definition: Accurately identifies key challenges in federated LoRA fine-tuning

Weaknesses

Limited Performance Improvement Magnitude: Relative to method complexity, performance improvement is modest
Insufficient Theoretical Analysis: Lacks theoretical explanation for why matrices A and B exhibit different sensitivities
Limited Experimental Scale: Validation on only two models and two datasets; generalization requires strengthening
Missing Computational Overhead Analysis: Lacks detailed analysis of computational and communication costs

Impact

Academic Contribution: Provides new research perspectives for parameter-efficient fine-tuning in federated learning
Practical Value: Demonstrates application potential in privacy-preserving distributed large model fine-tuning scenarios
Reproducibility: Paper provides detailed experimental settings and parameter configurations

Applicable Scenarios

This method is particularly suitable for:

Privacy-sensitive distributed large model fine-tuning scenarios
Federated learning environments with strong data heterogeneity
Applications requiring balance between global generalization and personalization
Resource-constrained environments requiring efficient fine-tuning

References

The paper cites 25 relevant references, covering important works in key domains including LoRA, federated learning, and parameter-efficient fine-tuning, providing a solid theoretical foundation for the research.

Overall Assessment: This is a valuable work at the intersection of federated learning and parameter-efficient fine-tuning. While performance improvements are relatively modest, the fine-grained analytical perspective and differentiated optimization strategies proposed offer new research directions for the field, demonstrating certain academic value and practical potential.