2025-11-26T03:25:17.925806

An Accelerated Distributed Algorithm with Equality and Inequality Coupling Constraints

Qiu, Qian, Lin et al.

This paper studies distributed convex optimization with both affine equality and nonlinear inequality couplings through the duality analysis. We first formulate the dual of the coupling-constraint problem and reformulate it as a consensus optimization problem over a connected network. To efficiently solve this dual problem and hence the primal problem, we design an accelerated linearized algorithm that, at each round, a look-ahead linearization of the separable objective is combined with a quadratic penalty on the Laplacian constraint, a proximal step, and an aggregation of iterations. On the theory side, we prove non-ergodic rates for both the primal optimality error and the feasibility error. On the other hand, numerical experiments show a faster decrease of optimality error and feasibility residual than augmented-Lagrangian tracking and distributed subgradient baselines under the same communication budget.

academic

An Accelerated Distributed Algorithm with Equality and Inequality Coupling Constraints

Basic Information

Paper ID: 2511.19708
Title: An Accelerated Distributed Algorithm with Equality and Inequality Coupling Constraints
Authors: Chenyang Qiu, Yangyang Qian, Zongli Lin, Yacov A. Shamash
Affiliations: University of Virginia (Qiu, Qian, Lin), Stony Brook University (Shamash)
Classification: math.OC (Optimization and Control), cs.SY (Systems and Control), eess.SY (Systems and Control)
Submission Date: November 24, 2025
Paper Link: https://arxiv.org/abs/2511.19708

Abstract

This paper investigates distributed convex optimization problems with affine equality constraints and nonlinear inequality constraints. Through dual analysis, the coupled constraint problem is transformed into a consensus optimization problem over a connected network. To efficiently solve the dual problem and subsequently the primal problem, an accelerated linearized algorithm is designed that combines lookahead linearization of separable objective functions, quadratic penalty terms for Laplacian constraints, proximal steps, and iterative aggregation in each iteration. Theoretically, non-ergodic convergence rates for both optimality error and feasibility error of the primal problem are established. Numerical experiments demonstrate that under the same communication budget, the proposed algorithm outperforms state-of-the-art methods in terms of convergence speed for optimality error and feasibility residuals.

Research Background and Motivation

1. Problem Definition

Distributed optimization aims to minimize a global objective function in multi-agent systems through local computation and communication. This paper focuses on the Coupling-Constraint Problem (CCP), which is particularly challenging because agents must coordinate local decisions while satisfying global coupling constraints.

2. Problem Significance

Such problems are widely encountered in practical applications:

Smart Grids: In economic dispatch problems, global affine equality constraints represent power balance conditions (total generation meets total demand)
Resource Allocation: Requires simultaneously satisfying individual and global constraints
Emission Constraints: Network capacity limitations and other physical constraints modeled as coupling inequality constraints

3. Limitations of Existing Methods

Equality Constraint Handling: Existing methods such as ADMM, mirror methods, and gradient tracking primarily target equality constraints
Inequality Constraint Handling: Methods for affine inequality constraints are inapplicable to nonlinear constraints
Convergence Rate Issues: Algorithms addressing global coupling nonlinear inequality constraints face the following limitations:
- Asymptotic convergence 13,17,18
- Ergodic convergence rates: O(ln N/√N) 14, O(1/√N) 15, O(1/N) 16
- Lack of acceleration and non-ergodic convergence guarantees

4. Research Motivation

Most existing distributed algorithms do not consider accelerated convergence, resulting in relatively slow convergence rates. This paper aims to develop a distributed algorithm with provable accelerated non-ergodic convergence rates, extending theoretical guarantees of classical first-order methods to the CCP framework with general (possibly non-smooth) cost functions.

Core Contributions

Algorithm Innovation: Proposes a novel accelerated distributed optimization algorithm capable of simultaneously handling affine equality constraints and nonlinear inequality coupling constraints
Theoretical Breakthrough: Establishes non-ergodic convergence rates:
- Optimality error of primal problem: O(1/N²) + O(1/N)
- Constraint violation error: O(1/N²) + O(1/N)
- Significantly improves upon existing ergodic or asymptotic convergence guarantees
Dual Reformulation: Transforms CCP into a dual problem, leveraging separability to interpret it as a consensus optimization problem
Experimental Validation: Numerical experiments demonstrate that under the same iteration budget, the algorithm outperforms state-of-the-art methods such as ALT and distributed subgradient algorithms in terms of optimality error and feasibility residual descent speed

Method Details

Problem Formulation

Primal Problem (Problem 1): $\min_{x \in X} f(x) = \sum_{i=1}^{n} f_i(x_i)$

Subject to:

Equality coupling constraint: $\sum_{i=1}^{n} B_i x_i = \sum_{i=1}^{n} b_i$
Inequality coupling constraint: $\sum_{i=1}^{n} h_i(x_i) \leq 0$
Local constraint: $x_i \in X_i \subseteq \mathbb{R}^p$

Where:

$x = [x_1^T, x_2^T, \ldots, x_n^T]^T \in \mathbb{R}^{np}$
$B_i \in \mathbb{R}^{d \times p}$ , $b_i \in \mathbb{R}^d$
$h_i: \mathbb{R}^p \to \mathbb{R}^m$ is a possibly nonlinear function

Key Assumptions:

Assumption 1: $f_i$ is a proper $\mu_f$ -strongly convex function; $h_i$ is convex and $l_h$ -Lipschitz continuous
Assumption 2: $X_i$ is a compact convex set; Slater condition holds (a strictly feasible point exists)

Model Architecture

Step 1: Dual Problem Construction

Introduce Lagrange multipliers $\mu \in \mathbb{R}^d$ (equality constraints) and $\delta \in \mathbb{R}_+^m$ (inequality constraints). The Lagrangian function is:

$L(x, \mu, \delta) = \sum_{i=1}^{n} \left( F_i(x_i) + \langle \mu, B_i x_i - b_i \rangle + \langle \delta, h_i(x_i) \rangle \right)$

where $F_i = f_i + \mathbb{1}_{X_i}$ ( $\mathbb{1}_{X_i}$ is the indicator function).

Dual Problem: $\min_{\mu \in \mathbb{R}^d, \delta \in \mathbb{R}_+^m} \sum_{i=1}^{n} g_i(\mu, \delta)$

where $g_i(\mu, \delta) = -\min_{x_i} L_i(x_i, \mu, \delta)$ .

Step 2: Consensus Optimization Reformulation

Each agent $i$ maintains copies of dual variables $y_i = [\mu_i^T, \delta_i^T]^T \in Y = \mathbb{R}^d \times \mathbb{R}_+^m$ . The dual problem is reformulated as:

$\min_{y \in \mathcal{Y}} G(y) = \sum_{i=1}^{n} g_i(y_i)$ $\text{s.t. } y_1 = y_2 = \cdots = y_n$

Using the Laplacian matrix $H$ and $W = H \otimes I_{d+m}$ , the consensus constraint is equivalent to $W^{1/2}y = 0$ , yielding the compact form (Problem 4):

$\min_{y \in \mathcal{Y}} G(y) \quad \text{s.t. } W^{1/2}y = 0$

Step 3: Accelerated Linearized Multiplier Method

Augmented Lagrangian Function: $\mathcal{L}_\rho(y, v) = G(y) - \langle v, W^{1/2}y \rangle + \frac{\rho}{2} \|W^{1/2}y\|^2$

Algorithm Iteration (Algorithm 1):

Initialization: ŷ_{i,1} = y_{i,1} ∈ Y, λ_{i,1} = 0

For k = 1, 2, ..., N:
  1. Extrapolation step:
     ỹ_{i,k} = (1 - α_k)ŷ_{i,k} + α_k y_{i,k}
  
  2. Local optimization (gradient computation):
     x_{i,k} = argmin_x {F_i(x) + ⟨[B_i x - b_i; h_i(x)], ỹ_{i,k}⟩}
     ∇g_i(ỹ_{i,k}) = -[B_i x_{i,k} - b_i; h_i(x_{i,k})]
  
  3. Information exchange:
     t_{i,k} = Σ_{j∈N_i} H_{ij}(y_{i,k} - y_{j,k})
  
  4. Proximal update:
     y_{i,k+1} = P_Y{y_{i,k} - 1/η_k(∇g_i(ỹ_{i,k}) - λ_{i,k} - θ_k t_{i,k})}
  
  5. Aggregation step:
     ŷ_{i,k+1} = (1 - α_k)ŷ_{i,k} + α_k y_{i,k+1}
  
  6. Dual variable update:
     λ_{i,k+1} = λ_{i,k} - β_k t_{i,k}

Parameter Settings:

$\alpha_k = \frac{2}{k+1}$ (Nesterov acceleration parameter)
$\theta_k = \frac{\rho N}{k}$ (adaptive Laplacian penalty)
$\beta_k = \frac{\rho k}{N}$ (dual step size)
$\eta_k = \frac{2l_g + \rho N \|W\|}{k}$ (proximal parameter)

where $l_g = \sqrt{\frac{2}{\mu_f^2}(\|B_i\|^2 + l_h^2)} \cdot \max\{\|B_i\|^2, l_h^2\}$ is the Lipschitz constant of $g_i$ .

Technical Innovations

Three-Variable Coordination Mechanism:
- $\tilde{y}_k$ : Extrapolated prediction point for gradient evaluation, introducing momentum effects
- $y_k$ : Proximal correction point ensuring stability
- $\hat{y}_k$ : Smoothed trajectory point enabling optimal convergence analysis
Adaptive Parameter Scheduling:
- $\theta_k$ and $\beta_k$ adaptively adjust with iteration count, balancing convergence speed and stability
- Parameter design ensures non-ergodic O(1/N²) acceleration rate
Linearization Strategy:
- Linearizes the non-separable quadratic term $\frac{\rho}{2}\|W^{1/2}y\|^2$
- Combines lookahead gradient $\nabla G(\tilde{y}_k)$ rather than current point gradient
Distributed Implementation:
- Each node only solves local subproblems (Equation 14)
- Requires only one round of neighbor information exchange (Step 6: $t_{i,k}$ )
- No global coordinator needed

Experimental Setup

Dataset

Synthetic Optimization Problem: $\min_{x_i \in X_i} \sum_{i=1}^{n} \left( x_i^T A_i x_i + b_i^T x_i + \|x_i\|_1 \right)$

Subject to:

Equality: $\sum_{i=1}^{n} C_i x_i = 0_p$
Inequality: $\sum_{i=1}^{n} \|x_i - r_i\|_1 \leq \sum_{i=1}^{n} d_i$

Parameter Settings:

Number of agents: $n = 20$
Local dimension: $p = 5$
Box constraints: $x_i \in X_i = \{x \in \mathbb{R}^p | \underline{x}_i \leq x \leq \bar{x}_i\}$ $x_{i} \in X_{i} = {x \in R^{p} ∣ \underline{x}_{i} \leq x \leq \overset{x}{ˉ}_{i}}$
- $\underline{x}_i \sim U[-10, -9]$ , $\bar{x}_i \sim U[9, 10]$
Matrix $A_i = U_i \Lambda_i U_i^T$ $A_{i} = U_{i} Λ_{i} U_{i}^{T}$ :
- $U_i$ is a random orthogonal matrix
- Eigenvalues of $\Lambda_i$ linearly distributed in $[1, 100]$ (condition number $\kappa = 100$ )
$C_i, b_i \sim \mathcal{N}(0, I_p)$
$d_i \sim U(1, 6)$

Communication Network:

Connected undirected graph: each node connected to nearest and second-nearest neighbors
Edge set: $(i, i+1)$ for $1 \leq i \leq 19$ , plus $(1, 20)$

Evaluation Metrics

Primal Problem Optimality Error: $\frac{|f(x_k) - f(x^*)|^2}{|f(x_1) - f(x^*)|^2}$
Constraint Violation Absolute Error: $\left\| \sum_{i=1}^{n} C_i x_{i,k} \right\| + \left[ \sum_{i=1}^{n} (\|x_{i,k} - r_i\|_1 - d_i) \right]_+$

Comparison Methods

Distributed Subgradient 14: Distributed subgradient algorithm
ALT (Augmented Lagrangian Tracking) 17: Augmented Lagrangian tracking algorithm
IPLUX (Integrated Primal-Dual Proximal) 16: Integrated primal-dual proximal algorithm

Benchmark Solution: Optimal solution $x^*$ obtained using YALMIP with MOSEK solver

Implementation Details

All algorithms use identical initialization
Number of iterations: $N = 1200$
Proposed algorithm parameters set according to Theorem 1

Experimental Results

Main Results

Figure 1: Primal Problem Optimality Error

Proposed Algorithm: Achieves $10^{-6}$ precision at $k=1200$
ALT: Monotonic decrease but slower, ending at approximately $10^{-2}$
Distributed Subgradient: Slowest descent, remaining in $10^{-1}$ - $10^0$ range
IPLUX: Performance between ALT and proposed algorithm

Figure 2: Constraint Violation Absolute Error

Proposed Algorithm: Earliest to reach below $10^{-4}$
Other Algorithms: Significantly slower convergence

Experimental Findings

Convergence Speed: The proposed algorithm converges significantly faster than all comparison methods under the same iteration budget
Accuracy Advantage:
- Optimality error reduced by approximately 4 orders of magnitude (from $10^{-2}$ to $10^{-6}$ )
- Feasibility error reduced by approximately 2 orders of magnitude
Clear Acceleration Effect: Theoretical advantages of non-ergodic convergence rate are verified in experiments
Robustness: Algorithm performs stably with non-smooth objective functions (containing $\ell_1$ norm) and nonlinear constraints

1. Equality Coupling Constraints

ADMM Methods 6,7: Alternating Direction Method of Multipliers
Mirror Methods 8: Distributed algorithms based on mirror descent
Gradient Tracking 9: Gradient tracking for dual problems

2. Inequality Coupling Constraints

Affine Inequality 10-12: Distributed proximal algorithms, aggregated optimization
Nonlinear Inequality 13-18:
- Dual subgradient method 13
- Operator splitting primal-dual framework 14
- Dynamic average consensus 15
- Sparse/dense constraint handling 16
- ALT algorithm 17

3. Acceleration Methods

Nesterov Acceleration 19: O(1/N²) rate for unconstrained convex optimization
FISTA 20: Fast Iterative Shrinkage-Thresholding Algorithm
Fast Lagrangian Methods 21,22: Accelerated Lagrangian methods for convex optimization
Distributed Acceleration 23,24: DCatalyst, energy conservation principle

Advantages of This Work

First to extend Nesterov acceleration to distributed CCP with simultaneous equality and nonlinear inequality coupling constraints
Provides non-ergodic convergence guarantees (independent of averaging), improving existing ergodic or asymptotic results
Applicable to non-smooth objective functions

Theoretical Analysis

Key Lemma (Proposition 1)

Lipschitz Smoothness of Dual Function: $\|\nabla g_i(z_1) - \nabla g_i(z_2)\| \leq l_g \|z_1 - z_2\|$

where $l_g = \sqrt{\frac{2}{\mu_f^2}(\|B_i\|^2 + l_h^2)} \cdot \max\{\|B_i\|^2, l_h^2\}$

Proof Strategy:

Utilize strong convexity of $F_i$ and convexity of $h_i$
Obtain gradient expression via Danskin's theorem
Establish inequality combining strong convexity and Lipschitz continuity

Main Theorem (Theorem 1)

Convergence Rate:

Feasibility Error: $\left\| \sum_{i=1}^{n} B_i x_{i,N+1} - b_i \right\| + \left\| \left[ \sum_{i=1}^{n} h_i(x_{i,N+1}) \right]_+ \right\| \leq \varepsilon_c$

where: $\varepsilon_c = \left( \frac{2l_g}{N(N+1)} + \frac{\rho}{N+1}\|W\| \right) \|y_1 - y^*\|^2 + \frac{1}{\rho(N+1)\lambda_2(W)}$

Optimality Error: $-\varepsilon_p \leq f(x_{N+1}) - f(x^*) \leq \bar{\varepsilon}_p$

where $\varepsilon_p$ and $\bar{\varepsilon}_p$ have similar O(1/N²) + O(1/N) form.

Proof Key Steps:

Energy Function Construction: $\Phi_k = G(\hat{y}_k) - G(y^*) - \langle \lambda, \hat{y}_k - y^* \rangle$
Recursive Inequality: Using convexity and smoothness: $k(k+1)\Phi_{k+1} - k(k-1)\Phi_k \leq 2k[\text{telescoping terms}]$
Summation Technique: Sum from $k=1$ to $N$ , utilizing telescoping property
Parameter Selection: Achieve acceleration through carefully designed $\alpha_k, \theta_k, \beta_k, \eta_k$

Conclusions and Discussion

Main Conclusions

Algorithm Contribution: Proposes the first accelerated distributed algorithm for simultaneous affine equality and nonlinear inequality coupling constraints
Theoretical Guarantee: Establishes non-ergodic O(1/N²) + O(1/N) convergence rate, significantly improving existing results
Practicality: Each iteration involves simple computation (one local subproblem + one round of neighbor communication), suitable for large-scale deployment
Experimental Validation: On representative test sets, the algorithm achieves higher feasibility and lower error under the same iteration budget

Limitations

Strong Convexity Assumption: Algorithm and theoretical analysis depend on strong convexity of objective functions (Assumption 1), limiting applicability
Slater Condition: Requires existence of strictly feasible points (Assumption 2), which may not hold in some practical problems
Compact Set Assumption: Assumption 2 requires local constraint sets $X_i$ to be compact, excluding unbounded constraints
Parameter Tuning: While theoretical parameter settings are provided, practical applications may require problem-specific adjustments
Communication Complexity: Does not explicitly analyze communication complexity, focusing only on iteration complexity
Non-convex Extension: Theoretical and algorithmic framework does not cover non-convex optimization problems

Future Directions

Relax Strong Convexity: Extend to general convex or even non-convex problems
Stochastic/Online Versions: Develop stochastic gradient versions for large-scale data
Asynchronous Communication: Study convergence under asynchronous communication protocols
Time-Varying Networks: Extend to dynamically changing communication topologies
Practical Applications: Validate in real systems such as smart grids and multi-agent coordination

In-Depth Evaluation

Strengths

Strong Theoretical Innovation:
- First to achieve O(1/N²) acceleration in distributed optimization with simultaneous equality and nonlinear inequality coupling constraints
- Non-ergodic convergence guarantee superior to existing ergodic or asymptotic results
- Rigorous mathematical proofs with clear logic
Clever Algorithm Design:
- Three-variable coordination mechanism ( $\tilde{y}_k, y_k, \hat{y}_k$ ) effectively achieves acceleration
- Adaptive parameter scheduling balances convergence speed and stability
- Linearization strategy maintains computational separability
Comprehensive Experiments:
- Comparison with three state-of-the-art algorithms
- Clear experimental results demonstrating acceleration effect
- High-quality figures with clear conclusions
High Practical Value:
- Completely distributed algorithm suitable for large-scale deployment
- Reasonable computational load per iteration
- Applicable to non-smooth objective functions
Clear Writing:
- Well-structured with rigorous logic
- Clear symbol definitions
- Detailed and understandable proofs

Weaknesses

Strong Assumptions:
- Strong convexity assumption limits applicability (many practical problems are only convex or non-convex)
- Compact set and Slater conditions difficult to verify in some applications
Experimental Limitations:
- Testing only on synthetic data, lacking real application scenario validation
- No testing on large-scale networks (n=20 is relatively small)
- No analysis of communication overhead and computation time
Parameter Dependence:
- Algorithm performance depends on problem parameters ( $\mu_f, l_h, \|B_i\|$ , etc.)
- These parameters may be unknown or difficult to estimate in practical applications
Convergence Constants:
- Constants in theoretical convergence rate may be large
- No lower bounds or optimality analysis provided
Missing Analysis:
- No discussion of algorithm sensitivity to initialization
- No analysis of parameter selection impact on convergence
- Lacks discussion of failure cases or difficult scenarios

Impact

Academic Value:
- Provides new theoretical tools for distributed constrained optimization
- Acceleration techniques may inspire other distributed algorithm designs
- Expected high citation in optimization and control fields
Practical Value:
- Directly applicable to smart grid economic dispatch
- Extensible to multi-robot coordination, sensor networks, etc.
- Algorithm 1 provides clear implementation guidelines
Reproducibility:
- Detailed algorithm description, easy to implement
- Clear experimental setup
- Authors encouraged to open-source code for broader application

Applicable Scenarios

Strongly Recommended Scenarios:

Smart grid economic dispatch (satisfies strong convexity and compact set assumptions)
Resource allocation problems (convex cost functions)
Distributed machine learning (strongly convex regularization)

Use with Caution Scenarios:

Non-convex optimization problems (theory inapplicable)
Unbounded constraint sets (violates compact set assumption)
Real-time systems (may require many iterations)

Scenarios Requiring Improvement:

Large-scale networks (scalability needs verification)
Time-varying environments (algorithm extension needed)
Communication-limited settings (communication efficiency consideration needed)

Key References

6 T.-H. Chang et al., "Multi-agent distributed optimization via inexact consensus ADMM," IEEE Trans. Signal Process., 2014.

14 S. Liang and G. Yin, "Distributed dual subgradient algorithms with iterate-averaging feedback," IEEE Trans. Cybernetics, 2019.

16 X. Wu et al., "Distributed optimization with coupling constraints," IEEE Trans. Automatic Control, 2022.

17 A. Falsone and M. Prandini, "Augmented Lagrangian tracking for distributed optimization," Automatica, 2023.

19 Y. Nesterov, "A method for unconstrained convex minimization problem with the rate of convergence O(1/k²)," Dokl. Akad. Nauk. SSSR, 1983.

Overall Assessment: This is a high-quality theoretical paper making important contributions to distributed optimization. The algorithm design is clever, theoretical analysis rigorous, and experimental results convincing. While certain assumption limitations exist, the algorithm demonstrates significant advantages within its applicable scope. Further validation in practical systems is recommended, along with exploration of relaxing the strong convexity assumption.