2025-11-26T03:25:17.925806

An Accelerated Distributed Algorithm with Equality and Inequality Coupling Constraints

Qiu, Qian, Lin et al.
This paper studies distributed convex optimization with both affine equality and nonlinear inequality couplings through the duality analysis. We first formulate the dual of the coupling-constraint problem and reformulate it as a consensus optimization problem over a connected network. To efficiently solve this dual problem and hence the primal problem, we design an accelerated linearized algorithm that, at each round, a look-ahead linearization of the separable objective is combined with a quadratic penalty on the Laplacian constraint, a proximal step, and an aggregation of iterations. On the theory side, we prove non-ergodic rates for both the primal optimality error and the feasibility error. On the other hand, numerical experiments show a faster decrease of optimality error and feasibility residual than augmented-Lagrangian tracking and distributed subgradient baselines under the same communication budget.
academic

An Accelerated Distributed Algorithm with Equality and Inequality Coupling Constraints

Basic Information

  • Paper ID: 2511.19708
  • Title: An Accelerated Distributed Algorithm with Equality and Inequality Coupling Constraints
  • Authors: Chenyang Qiu, Yangyang Qian, Zongli Lin, Yacov A. Shamash
  • Affiliations: University of Virginia (Qiu, Qian, Lin), Stony Brook University (Shamash)
  • Classification: math.OC (Optimization and Control), cs.SY (Systems and Control), eess.SY (Systems and Control)
  • Submission Date: November 24, 2025
  • Paper Link: https://arxiv.org/abs/2511.19708

Abstract

This paper investigates distributed convex optimization problems with affine equality constraints and nonlinear inequality constraints. Through dual analysis, the coupled constraint problem is transformed into a consensus optimization problem over a connected network. To efficiently solve the dual problem and subsequently the primal problem, an accelerated linearized algorithm is designed that combines lookahead linearization of separable objective functions, quadratic penalty terms for Laplacian constraints, proximal steps, and iterative aggregation in each iteration. Theoretically, non-ergodic convergence rates for both optimality error and feasibility error of the primal problem are established. Numerical experiments demonstrate that under the same communication budget, the proposed algorithm outperforms state-of-the-art methods in terms of convergence speed for optimality error and feasibility residuals.

Research Background and Motivation

1. Problem Definition

Distributed optimization aims to minimize a global objective function in multi-agent systems through local computation and communication. This paper focuses on the Coupling-Constraint Problem (CCP), which is particularly challenging because agents must coordinate local decisions while satisfying global coupling constraints.

2. Problem Significance

Such problems are widely encountered in practical applications:

  • Smart Grids: In economic dispatch problems, global affine equality constraints represent power balance conditions (total generation meets total demand)
  • Resource Allocation: Requires simultaneously satisfying individual and global constraints
  • Emission Constraints: Network capacity limitations and other physical constraints modeled as coupling inequality constraints

3. Limitations of Existing Methods

  • Equality Constraint Handling: Existing methods such as ADMM, mirror methods, and gradient tracking primarily target equality constraints
  • Inequality Constraint Handling: Methods for affine inequality constraints are inapplicable to nonlinear constraints
  • Convergence Rate Issues: Algorithms addressing global coupling nonlinear inequality constraints face the following limitations:
    • Asymptotic convergence 13,17,18
    • Ergodic convergence rates: O(ln N/√N) 14, O(1/√N) 15, O(1/N) 16
    • Lack of acceleration and non-ergodic convergence guarantees

4. Research Motivation

Most existing distributed algorithms do not consider accelerated convergence, resulting in relatively slow convergence rates. This paper aims to develop a distributed algorithm with provable accelerated non-ergodic convergence rates, extending theoretical guarantees of classical first-order methods to the CCP framework with general (possibly non-smooth) cost functions.

Core Contributions

  1. Algorithm Innovation: Proposes a novel accelerated distributed optimization algorithm capable of simultaneously handling affine equality constraints and nonlinear inequality coupling constraints
  2. Theoretical Breakthrough: Establishes non-ergodic convergence rates:
    • Optimality error of primal problem: O(1/N²) + O(1/N)
    • Constraint violation error: O(1/N²) + O(1/N)
    • Significantly improves upon existing ergodic or asymptotic convergence guarantees
  3. Dual Reformulation: Transforms CCP into a dual problem, leveraging separability to interpret it as a consensus optimization problem
  4. Experimental Validation: Numerical experiments demonstrate that under the same iteration budget, the algorithm outperforms state-of-the-art methods such as ALT and distributed subgradient algorithms in terms of optimality error and feasibility residual descent speed

Method Details

Problem Formulation

Primal Problem (Problem 1): minxXf(x)=i=1nfi(xi)\min_{x \in X} f(x) = \sum_{i=1}^{n} f_i(x_i)

Subject to:

  • Equality coupling constraint: i=1nBixi=i=1nbi\sum_{i=1}^{n} B_i x_i = \sum_{i=1}^{n} b_i
  • Inequality coupling constraint: i=1nhi(xi)0\sum_{i=1}^{n} h_i(x_i) \leq 0
  • Local constraint: xiXiRpx_i \in X_i \subseteq \mathbb{R}^p

Where:

  • x=[x1T,x2T,,xnT]TRnpx = [x_1^T, x_2^T, \ldots, x_n^T]^T \in \mathbb{R}^{np}
  • BiRd×pB_i \in \mathbb{R}^{d \times p}, biRdb_i \in \mathbb{R}^d
  • hi:RpRmh_i: \mathbb{R}^p \to \mathbb{R}^m is a possibly nonlinear function

Key Assumptions:

  • Assumption 1: fif_i is a proper μf\mu_f-strongly convex function; hih_i is convex and lhl_h-Lipschitz continuous
  • Assumption 2: XiX_i is a compact convex set; Slater condition holds (a strictly feasible point exists)

Model Architecture

Step 1: Dual Problem Construction

Introduce Lagrange multipliers μRd\mu \in \mathbb{R}^d (equality constraints) and δR+m\delta \in \mathbb{R}_+^m (inequality constraints). The Lagrangian function is:

L(x,μ,δ)=i=1n(Fi(xi)+μ,Bixibi+δ,hi(xi))L(x, \mu, \delta) = \sum_{i=1}^{n} \left( F_i(x_i) + \langle \mu, B_i x_i - b_i \rangle + \langle \delta, h_i(x_i) \rangle \right)

where Fi=fi+1XiF_i = f_i + \mathbb{1}_{X_i} (1Xi\mathbb{1}_{X_i} is the indicator function).

Dual Problem: minμRd,δR+mi=1ngi(μ,δ)\min_{\mu \in \mathbb{R}^d, \delta \in \mathbb{R}_+^m} \sum_{i=1}^{n} g_i(\mu, \delta)

where gi(μ,δ)=minxiLi(xi,μ,δ)g_i(\mu, \delta) = -\min_{x_i} L_i(x_i, \mu, \delta).

Step 2: Consensus Optimization Reformulation

Each agent ii maintains copies of dual variables yi=[μiT,δiT]TY=Rd×R+my_i = [\mu_i^T, \delta_i^T]^T \in Y = \mathbb{R}^d \times \mathbb{R}_+^m. The dual problem is reformulated as:

minyYG(y)=i=1ngi(yi)\min_{y \in \mathcal{Y}} G(y) = \sum_{i=1}^{n} g_i(y_i)s.t. y1=y2==yn\text{s.t. } y_1 = y_2 = \cdots = y_n

Using the Laplacian matrix HH and W=HId+mW = H \otimes I_{d+m}, the consensus constraint is equivalent to W1/2y=0W^{1/2}y = 0, yielding the compact form (Problem 4):

minyYG(y)s.t. W1/2y=0\min_{y \in \mathcal{Y}} G(y) \quad \text{s.t. } W^{1/2}y = 0

Step 3: Accelerated Linearized Multiplier Method

Augmented Lagrangian Function: Lρ(y,v)=G(y)v,W1/2y+ρ2W1/2y2\mathcal{L}_\rho(y, v) = G(y) - \langle v, W^{1/2}y \rangle + \frac{\rho}{2} \|W^{1/2}y\|^2

Algorithm Iteration (Algorithm 1):

Initialization: ŷ_{i,1} = y_{i,1} ∈ Y, λ_{i,1} = 0

For k = 1, 2, ..., N:
  1. Extrapolation step:
     ỹ_{i,k} = (1 - α_k)ŷ_{i,k} + α_k y_{i,k}
  
  2. Local optimization (gradient computation):
     x_{i,k} = argmin_x {F_i(x) + ⟨[B_i x - b_i; h_i(x)], ỹ_{i,k}⟩}
     ∇g_i(ỹ_{i,k}) = -[B_i x_{i,k} - b_i; h_i(x_{i,k})]
  
  3. Information exchange:
     t_{i,k} = Σ_{j∈N_i} H_{ij}(y_{i,k} - y_{j,k})
  
  4. Proximal update:
     y_{i,k+1} = P_Y{y_{i,k} - 1/η_k(∇g_i(ỹ_{i,k}) - λ_{i,k} - θ_k t_{i,k})}
  
  5. Aggregation step:
     ŷ_{i,k+1} = (1 - α_k)ŷ_{i,k} + α_k y_{i,k+1}
  
  6. Dual variable update:
     λ_{i,k+1} = λ_{i,k} - β_k t_{i,k}

Parameter Settings:

  • αk=2k+1\alpha_k = \frac{2}{k+1} (Nesterov acceleration parameter)
  • θk=ρNk\theta_k = \frac{\rho N}{k} (adaptive Laplacian penalty)
  • βk=ρkN\beta_k = \frac{\rho k}{N} (dual step size)
  • ηk=2lg+ρNWk\eta_k = \frac{2l_g + \rho N \|W\|}{k} (proximal parameter)

where lg=2μf2(Bi2+lh2)max{Bi2,lh2}l_g = \sqrt{\frac{2}{\mu_f^2}(\|B_i\|^2 + l_h^2)} \cdot \max\{\|B_i\|^2, l_h^2\} is the Lipschitz constant of gig_i.

Technical Innovations

  1. Three-Variable Coordination Mechanism:
    • y~k\tilde{y}_k: Extrapolated prediction point for gradient evaluation, introducing momentum effects
    • yky_k: Proximal correction point ensuring stability
    • y^k\hat{y}_k: Smoothed trajectory point enabling optimal convergence analysis
  2. Adaptive Parameter Scheduling:
    • θk\theta_k and βk\beta_k adaptively adjust with iteration count, balancing convergence speed and stability
    • Parameter design ensures non-ergodic O(1/N²) acceleration rate
  3. Linearization Strategy:
    • Linearizes the non-separable quadratic term ρ2W1/2y2\frac{\rho}{2}\|W^{1/2}y\|^2
    • Combines lookahead gradient G(y~k)\nabla G(\tilde{y}_k) rather than current point gradient
  4. Distributed Implementation:
    • Each node only solves local subproblems (Equation 14)
    • Requires only one round of neighbor information exchange (Step 6: ti,kt_{i,k})
    • No global coordinator needed

Experimental Setup

Dataset

Synthetic Optimization Problem: minxiXii=1n(xiTAixi+biTxi+xi1)\min_{x_i \in X_i} \sum_{i=1}^{n} \left( x_i^T A_i x_i + b_i^T x_i + \|x_i\|_1 \right)

Subject to:

  • Equality: i=1nCixi=0p\sum_{i=1}^{n} C_i x_i = 0_p
  • Inequality: i=1nxiri1i=1ndi\sum_{i=1}^{n} \|x_i - r_i\|_1 \leq \sum_{i=1}^{n} d_i

Parameter Settings:

  • Number of agents: n=20n = 20
  • Local dimension: p=5p = 5
  • Box constraints: xiXi={xRpxixxˉi}x_i \in X_i = \{x \in \mathbb{R}^p | \underline{x}_i \leq x \leq \bar{x}_i\}
    • xiU[10,9]\underline{x}_i \sim U[-10, -9], xˉiU[9,10]\bar{x}_i \sim U[9, 10]
  • Matrix Ai=UiΛiUiTA_i = U_i \Lambda_i U_i^T:
    • UiU_i is a random orthogonal matrix
    • Eigenvalues of Λi\Lambda_i linearly distributed in [1,100][1, 100] (condition number κ=100\kappa = 100)
  • Ci,biN(0,Ip)C_i, b_i \sim \mathcal{N}(0, I_p)
  • diU(1,6)d_i \sim U(1, 6)

Communication Network:

  • Connected undirected graph: each node connected to nearest and second-nearest neighbors
  • Edge set: (i,i+1)(i, i+1) for 1i191 \leq i \leq 19, plus (1,20)(1, 20)

Evaluation Metrics

  1. Primal Problem Optimality Error: f(xk)f(x)2f(x1)f(x)2\frac{|f(x_k) - f(x^*)|^2}{|f(x_1) - f(x^*)|^2}
  2. Constraint Violation Absolute Error: i=1nCixi,k+[i=1n(xi,kri1di)]+\left\| \sum_{i=1}^{n} C_i x_{i,k} \right\| + \left[ \sum_{i=1}^{n} (\|x_{i,k} - r_i\|_1 - d_i) \right]_+

Comparison Methods

  1. Distributed Subgradient 14: Distributed subgradient algorithm
  2. ALT (Augmented Lagrangian Tracking) 17: Augmented Lagrangian tracking algorithm
  3. IPLUX (Integrated Primal-Dual Proximal) 16: Integrated primal-dual proximal algorithm

Benchmark Solution: Optimal solution xx^* obtained using YALMIP with MOSEK solver

Implementation Details

  • All algorithms use identical initialization
  • Number of iterations: N=1200N = 1200
  • Proposed algorithm parameters set according to Theorem 1

Experimental Results

Main Results

Figure 1: Primal Problem Optimality Error

  • Proposed Algorithm: Achieves 10610^{-6} precision at k=1200k=1200
  • ALT: Monotonic decrease but slower, ending at approximately 10210^{-2}
  • Distributed Subgradient: Slowest descent, remaining in 10110^{-1}-10010^0 range
  • IPLUX: Performance between ALT and proposed algorithm

Figure 2: Constraint Violation Absolute Error

  • Proposed Algorithm: Earliest to reach below 10410^{-4}
  • Other Algorithms: Significantly slower convergence

Experimental Findings

  1. Convergence Speed: The proposed algorithm converges significantly faster than all comparison methods under the same iteration budget
  2. Accuracy Advantage:
    • Optimality error reduced by approximately 4 orders of magnitude (from 10210^{-2} to 10610^{-6})
    • Feasibility error reduced by approximately 2 orders of magnitude
  3. Clear Acceleration Effect: Theoretical advantages of non-ergodic convergence rate are verified in experiments
  4. Robustness: Algorithm performs stably with non-smooth objective functions (containing 1\ell_1 norm) and nonlinear constraints

1. Equality Coupling Constraints

  • ADMM Methods 6,7: Alternating Direction Method of Multipliers
  • Mirror Methods 8: Distributed algorithms based on mirror descent
  • Gradient Tracking 9: Gradient tracking for dual problems

2. Inequality Coupling Constraints

  • Affine Inequality 10-12: Distributed proximal algorithms, aggregated optimization
  • Nonlinear Inequality 13-18:
    • Dual subgradient method 13
    • Operator splitting primal-dual framework 14
    • Dynamic average consensus 15
    • Sparse/dense constraint handling 16
    • ALT algorithm 17

3. Acceleration Methods

  • Nesterov Acceleration 19: O(1/N²) rate for unconstrained convex optimization
  • FISTA 20: Fast Iterative Shrinkage-Thresholding Algorithm
  • Fast Lagrangian Methods 21,22: Accelerated Lagrangian methods for convex optimization
  • Distributed Acceleration 23,24: DCatalyst, energy conservation principle

Advantages of This Work

  • First to extend Nesterov acceleration to distributed CCP with simultaneous equality and nonlinear inequality coupling constraints
  • Provides non-ergodic convergence guarantees (independent of averaging), improving existing ergodic or asymptotic results
  • Applicable to non-smooth objective functions

Theoretical Analysis

Key Lemma (Proposition 1)

Lipschitz Smoothness of Dual Function: gi(z1)gi(z2)lgz1z2\|\nabla g_i(z_1) - \nabla g_i(z_2)\| \leq l_g \|z_1 - z_2\|

where lg=2μf2(Bi2+lh2)max{Bi2,lh2}l_g = \sqrt{\frac{2}{\mu_f^2}(\|B_i\|^2 + l_h^2)} \cdot \max\{\|B_i\|^2, l_h^2\}

Proof Strategy:

  1. Utilize strong convexity of FiF_i and convexity of hih_i
  2. Obtain gradient expression via Danskin's theorem
  3. Establish inequality combining strong convexity and Lipschitz continuity

Main Theorem (Theorem 1)

Convergence Rate:

  1. Feasibility Error: i=1nBixi,N+1bi+[i=1nhi(xi,N+1)]+εc\left\| \sum_{i=1}^{n} B_i x_{i,N+1} - b_i \right\| + \left\| \left[ \sum_{i=1}^{n} h_i(x_{i,N+1}) \right]_+ \right\| \leq \varepsilon_c

where: εc=(2lgN(N+1)+ρN+1W)y1y2+1ρ(N+1)λ2(W)\varepsilon_c = \left( \frac{2l_g}{N(N+1)} + \frac{\rho}{N+1}\|W\| \right) \|y_1 - y^*\|^2 + \frac{1}{\rho(N+1)\lambda_2(W)}

  1. Optimality Error: εpf(xN+1)f(x)εˉp-\varepsilon_p \leq f(x_{N+1}) - f(x^*) \leq \bar{\varepsilon}_p

where εp\varepsilon_p and εˉp\bar{\varepsilon}_p have similar O(1/N²) + O(1/N) form.

Proof Key Steps:

  1. Energy Function Construction: Φk=G(y^k)G(y)λ,y^ky\Phi_k = G(\hat{y}_k) - G(y^*) - \langle \lambda, \hat{y}_k - y^* \rangle
  2. Recursive Inequality: Using convexity and smoothness: k(k+1)Φk+1k(k1)Φk2k[telescoping terms]k(k+1)\Phi_{k+1} - k(k-1)\Phi_k \leq 2k[\text{telescoping terms}]
  3. Summation Technique: Sum from k=1k=1 to NN, utilizing telescoping property
  4. Parameter Selection: Achieve acceleration through carefully designed αk,θk,βk,ηk\alpha_k, \theta_k, \beta_k, \eta_k

Conclusions and Discussion

Main Conclusions

  1. Algorithm Contribution: Proposes the first accelerated distributed algorithm for simultaneous affine equality and nonlinear inequality coupling constraints
  2. Theoretical Guarantee: Establishes non-ergodic O(1/N²) + O(1/N) convergence rate, significantly improving existing results
  3. Practicality: Each iteration involves simple computation (one local subproblem + one round of neighbor communication), suitable for large-scale deployment
  4. Experimental Validation: On representative test sets, the algorithm achieves higher feasibility and lower error under the same iteration budget

Limitations

  1. Strong Convexity Assumption: Algorithm and theoretical analysis depend on strong convexity of objective functions (Assumption 1), limiting applicability
  2. Slater Condition: Requires existence of strictly feasible points (Assumption 2), which may not hold in some practical problems
  3. Compact Set Assumption: Assumption 2 requires local constraint sets XiX_i to be compact, excluding unbounded constraints
  4. Parameter Tuning: While theoretical parameter settings are provided, practical applications may require problem-specific adjustments
  5. Communication Complexity: Does not explicitly analyze communication complexity, focusing only on iteration complexity
  6. Non-convex Extension: Theoretical and algorithmic framework does not cover non-convex optimization problems

Future Directions

  1. Relax Strong Convexity: Extend to general convex or even non-convex problems
  2. Stochastic/Online Versions: Develop stochastic gradient versions for large-scale data
  3. Asynchronous Communication: Study convergence under asynchronous communication protocols
  4. Time-Varying Networks: Extend to dynamically changing communication topologies
  5. Practical Applications: Validate in real systems such as smart grids and multi-agent coordination

In-Depth Evaluation

Strengths

  1. Strong Theoretical Innovation:
    • First to achieve O(1/N²) acceleration in distributed optimization with simultaneous equality and nonlinear inequality coupling constraints
    • Non-ergodic convergence guarantee superior to existing ergodic or asymptotic results
    • Rigorous mathematical proofs with clear logic
  2. Clever Algorithm Design:
    • Three-variable coordination mechanism (y~k,yk,y^k\tilde{y}_k, y_k, \hat{y}_k) effectively achieves acceleration
    • Adaptive parameter scheduling balances convergence speed and stability
    • Linearization strategy maintains computational separability
  3. Comprehensive Experiments:
    • Comparison with three state-of-the-art algorithms
    • Clear experimental results demonstrating acceleration effect
    • High-quality figures with clear conclusions
  4. High Practical Value:
    • Completely distributed algorithm suitable for large-scale deployment
    • Reasonable computational load per iteration
    • Applicable to non-smooth objective functions
  5. Clear Writing:
    • Well-structured with rigorous logic
    • Clear symbol definitions
    • Detailed and understandable proofs

Weaknesses

  1. Strong Assumptions:
    • Strong convexity assumption limits applicability (many practical problems are only convex or non-convex)
    • Compact set and Slater conditions difficult to verify in some applications
  2. Experimental Limitations:
    • Testing only on synthetic data, lacking real application scenario validation
    • No testing on large-scale networks (n=20 is relatively small)
    • No analysis of communication overhead and computation time
  3. Parameter Dependence:
    • Algorithm performance depends on problem parameters (μf,lh,Bi\mu_f, l_h, \|B_i\|, etc.)
    • These parameters may be unknown or difficult to estimate in practical applications
  4. Convergence Constants:
    • Constants in theoretical convergence rate may be large
    • No lower bounds or optimality analysis provided
  5. Missing Analysis:
    • No discussion of algorithm sensitivity to initialization
    • No analysis of parameter selection impact on convergence
    • Lacks discussion of failure cases or difficult scenarios

Impact

  1. Academic Value:
    • Provides new theoretical tools for distributed constrained optimization
    • Acceleration techniques may inspire other distributed algorithm designs
    • Expected high citation in optimization and control fields
  2. Practical Value:
    • Directly applicable to smart grid economic dispatch
    • Extensible to multi-robot coordination, sensor networks, etc.
    • Algorithm 1 provides clear implementation guidelines
  3. Reproducibility:
    • Detailed algorithm description, easy to implement
    • Clear experimental setup
    • Authors encouraged to open-source code for broader application

Applicable Scenarios

Strongly Recommended Scenarios:

  1. Smart grid economic dispatch (satisfies strong convexity and compact set assumptions)
  2. Resource allocation problems (convex cost functions)
  3. Distributed machine learning (strongly convex regularization)

Use with Caution Scenarios:

  1. Non-convex optimization problems (theory inapplicable)
  2. Unbounded constraint sets (violates compact set assumption)
  3. Real-time systems (may require many iterations)

Scenarios Requiring Improvement:

  1. Large-scale networks (scalability needs verification)
  2. Time-varying environments (algorithm extension needed)
  3. Communication-limited settings (communication efficiency consideration needed)

Key References

6 T.-H. Chang et al., "Multi-agent distributed optimization via inexact consensus ADMM," IEEE Trans. Signal Process., 2014.

14 S. Liang and G. Yin, "Distributed dual subgradient algorithms with iterate-averaging feedback," IEEE Trans. Cybernetics, 2019.

16 X. Wu et al., "Distributed optimization with coupling constraints," IEEE Trans. Automatic Control, 2022.

17 A. Falsone and M. Prandini, "Augmented Lagrangian tracking for distributed optimization," Automatica, 2023.

19 Y. Nesterov, "A method for unconstrained convex minimization problem with the rate of convergence O(1/k²)," Dokl. Akad. Nauk. SSSR, 1983.


Overall Assessment: This is a high-quality theoretical paper making important contributions to distributed optimization. The algorithm design is clever, theoretical analysis rigorous, and experimental results convincing. While certain assumption limitations exist, the algorithm demonstrates significant advantages within its applicable scope. Further validation in practical systems is recommended, along with exploration of relaxing the strong convexity assumption.