2025-11-13T22:16:11.184529

Fast and Interpretable Mixed-Integer Linear Program Solving by Learning Model Reduction

Li, Chen, Li et al.

By exploiting the correlation between the structure and the solution of Mixed-Integer Linear Programming (MILP), Machine Learning (ML) has become a promising method for solving large-scale MILP problems. Existing ML-based MILP solvers mainly focus on end-to-end solution learning, which suffers from the scalability issue due to the high dimensionality of the solution space. Instead of directly learning the optimal solution, this paper aims to learn a reduced and equivalent model of the original MILP as an intermediate step. The reduced model often corresponds to interpretable operations and is much simpler, enabling us to solve large-scale MILP problems much faster than existing commercial solvers. However, current approaches rely only on the optimal reduced model, overlooking the significant preference information of all reduced models. To address this issue, this paper proposes a preference-based model reduction learning method, which considers the relative performance (i.e., objective cost and constraint feasibility) of all reduced models on each MILP instance as preferences. We also introduce an attention mechanism to capture and represent preference information, which helps improve the performance of model reduction learning tasks. Moreover, we propose a SetCover based pruning method to control the number of reduced models (i.e., labels), thereby simplifying the learning process. Evaluation on real-world MILP problems shows that 1) compared to the state-of-the-art model reduction ML methods, our method obtains nearly 20% improvement on solution accuracy, and 2) compared to the commercial solver Gurobi, two to four orders of magnitude speedups are achieved.

academic

Fast and Interpretable Mixed-Integer Linear Program Solving by Learning Model Reduction

Basic Information

Paper ID: 2501.00307
Title: Fast and Interpretable Mixed-Integer Linear Program Solving by Learning Model Reduction
Authors: Yixuan Li, Can Chen, Jiajun Li, Jiahui Duan, Xiongwei Han, Tao Zhong, Vincent Chau, Weiwei Wu, Wanyuan Wang
Classification: cs.LG cs.AI
Publication Date: December 31, 2024 (arXiv preprint)
Institutions: School of Computer Science and Engineering, Southeast University; Huawei Noah's Ark Laboratory
Paper Link: https://arxiv.org/abs/2501.00307

Abstract

This paper proposes a machine learning-based approach for solving large-scale mixed-integer linear programs (MILPs) by leveraging the correlation between MILP structure and solutions. Unlike existing end-to-end solution learning methods, this work learns a simplified equivalent model of the original MILP as an intermediate step. The paper introduces a preference-based model reduction learning method that considers the relative performance of all reduced models on each MILP instance as preference information and employs attention mechanisms to capture this preference information. Experimental results demonstrate approximately 20% improvement in solution accuracy compared to state-of-the-art model reduction ML methods, and achieve 2-4 orders of magnitude speedup compared to the commercial solver Gurobi.

Research Background and Motivation

Problem Definition

Mixed-integer linear programming (MILP) has widespread applications in critical domains including supply chain logistics, service scheduling, energy management, and transportation planning. In practical industrial applications, MILP instances typically involve hundreds of thousands of decision variables and constraints. Existing commercial solvers based on exact methods incur high computational costs and cannot meet real-time requirements.

Research Motivation

Scalability Issues: Existing ML methods directly learn high-dimensional solution space mappings, suffering from scalability problems
Lack of Interpretability: End-to-end approaches cannot provide interpretable solutions or intuitive understanding
Insufficient Utilization of Preference Information: Current model reduction methods focus only on optimal reduced models, neglecting valuable preference information from all reduced models

Limitations of Existing Methods

End-to-end Solution Prediction: Low prediction accuracy due to high dimensionality of solution space
Learning to Optimize: Limited efficiency due to decision horizon constraints
Existing Model Reduction Methods: Treat feasible reduced models as equally ideal labels, failing to fully exploit comparative information

Core Contributions

Proposes Preference-Based Model Reduction Learning: Converts reduced model performance into preference information, fully exploiting comparative information in the instance-strategy space
Designs Attention Mechanism Architecture: Captures correlations between instances and preference-reduced models, improving learning accuracy
Introduces SETCOVER Pruning Technique: Controls the number of reduced models (labels), generating minimal label sets feasible for all instances
Achieves Significant Performance Improvements: Validates on real MILP problems with 20% accuracy improvement over existing methods and 2-4 orders of magnitude speedup compared to Gurobi

Methodology Details

Task Definition

Given a parameterized MILP problem:

min f(c, x)
s.t. g(A, x) ≤ b
     xI ∈ Z^d, x_{-I} ∈ R^{n-d}

where θ = ⟨A, c, b⟩ represents MILP parameters.

Optimal Strategy Definition: s*(θ) = (T(θ), x*_I(θ)), including the set of tight constraints T(θ) and optimal values of integer variables.

Objective: Learn the mapping from parameters θ to optimal strategy s*(θ).

Model Architecture

1. Strategy Generation and Pruning Phase

Strategy Generation: Uses Good-Turing estimator to determine instance quantity until N₁/N falls below threshold
Strategy Pruning: Constructs instance-strategy bipartite graph G(V_θ, V_s, E), transforms pruning problem into SETCOVER problem
Greedy Algorithm: Iteratively selects strategy nodes covering the most uncovered instance nodes

2. Preference-Based Strategy Learning

Preference Computation:

r(θᵢ, sⱼ) = -log(p(θᵢ, sⱼ) + d(θᵢ, sⱼ))

where p(θᵢ, sⱼ) represents infeasibility and d(θᵢ, sⱼ) represents suboptimality.

Preference Definition:

(θᵢ, sⱼ) ≻ (θᵢ, sₖ) ⇔ r(θᵢ, sⱼ) > r(θᵢ, sₖ)

Attention Mechanism Encoding:

A([θᵢ, S^P]) = softmax(QK^T/√d)V

Treats each instance-strategy pair ⟨θᵢ, sⱼ⟩ as a token, extracting features through multi-layer attention mechanisms.

Technical Innovations

1. Preference Sampling Optimization

Leverages transitivity of preferences, ordering strategies by preference, requiring only M_P ordered preference samples instead of (M_P choose 2) pairwise comparisons, reducing sample complexity from quadratic to linear.

2. Dual Loss Function

Preference Loss:

L_p(φ) = -1/N ∑∑[μᵢ,ⱼ log(pᵢ,ⱼ) + (1-μᵢ,ⱼ) log(1-pᵢ,ⱼ)]

Reward Difference Loss:

L_d(φ) = 1/N ∑∑(r̂ᵢ,σ(j) - r̂ᵢ,σ(j+1) - δᵢ,ⱼ)²

Total Loss: L_total(φ) = λ₁L_p(φ) + λ₂L_d(φ)

3. Top-k Strategy Inference

Selects top-k output strategies as candidates during inference, evaluates reduced models and selects the strategy with lowest infeasibility.

Experimental Setup

Datasets

MIPLIB: Real MILP problems from 6 scenarios with varying scales and solving difficulty
Fuel Cell Energy Management Problem: Adjusts problem complexity by increasing time steps T
Inventory Management Problem: 5 large-scale real industrial problems, averaging approximately 100,000 constraints

Evaluation Metrics

Accuracy Metrics:

accuracy = 1/N |{θᵢ | p(θᵢ, ŝⱼ) ≤ ε₁ ∧ d(θᵢ, ŝⱼ) ≤ ε₂}|

where both infeasibility and suboptimality tolerances are set to 1×10⁻⁴.

Comparison Methods

Gurobi: Advanced commercial solver (with warm start enabled)
Gurobi Heuristic: Gurobi heuristic mode (1-second time limit)
MLOPT: Most applicable model reduction-based learning method
Predict and Search: Solution prediction-based method

Implementation Details

Optimizer: AdamW, learning rate 0.0001-0.001
Learning rate decay: Linear decay by 0.9 every 10 epochs, 100 epochs total
Batch size: 128
Coordination parameter λ₁: 0.8-0.9

Experimental Results

Main Results

MIPLIB Dataset

In most scenarios, the method achieves performance comparable to Gurobi in suboptimality and feasibility
Shows slight advantages over MLOPT in feasibility, with significant improvements in suboptimality
Heuristic methods perform poorly on multiple metrics due to time constraints

Fuel Cell Energy Management

Achieves nearly 39% accuracy improvement on the most complex problem
Strategy pruning shows significant effects, particularly as problem scale increases
Top-k performance: Model performance improves with increasing k, outperforming MLOPT at equivalent k values

Inventory Management Problems

Maintains feasibility across all instances with stable accuracy around 90%
Achieves approximately 30% improvement over MLOPT on the largest problem P5 (exceeding 270,000 constraints)

Computational Time Analysis

Achieves 3 orders of magnitude time improvement compared to Gurobi
Maintains relatively stable time growth as problem scale increases
Minimal differences compared to MLOPT (at equivalent k values)
Top-k computation can be parallelized for further speedup

Ablation Studies

Preference Learning vs. Reward Fitting

On the fuel cell dataset, the preference method achieves approximately 9% average accuracy improvement over direct reward fitting, with superior performance on infeasibility and suboptimality metrics.

Ordered Sampling vs. Traditional Sampling

The ordered method achieves approximately 8% average accuracy improvement over traditional preference pair sampling across various problem scales, while significantly reducing training costs.

Classification of ML-based MILP Solving Methods

End-to-end Solution Prediction: Directly learns mapping from MILP instances to solutions
Learning to Optimize: Learns to accelerate traditional exact/heuristic methods
Learning to Simplify MILP: Learns preprocessing or problem size reduction

Relationship to Existing Work

vs. End-to-end Methods: Avoids high-dimensional solution space learning issues
vs. Learning to Optimize: Not constrained by decision horizon limitations
vs. Existing Model Reduction: Fully exploits preference information rather than considering only optimal strategies

Conclusions and Discussion

Main Conclusions

Preference-based model reduction learning significantly improves MILP solving performance
SETCOVER pruning technique effectively controls strategy quantity
Attention mechanisms successfully capture instance-strategy correlations
Achieves significant speedup and accuracy improvements on real industrial problems

Limitations

Method applies to homogeneous MILPs with repetitive structure
Requires sufficient training data to learn effective preference models
Strategy generation phase still requires commercial solvers to obtain optimal solutions
Strategy space may remain large in large-scale problems

Future Directions

Extend to broader optimization problem types
Reduce dependence on training data
Further improve strategy generation efficiency
Explore unsupervised or semi-supervised learning approaches

In-Depth Evaluation

Strengths

Strong Innovation: First systematic introduction of preference learning to MILP model reduction
Solid Theoretical Foundation: Based on operations research theory of model reduction concepts
Comprehensive Experiments: Validation on multiple real datasets including industrial-scale problems
Significant Performance: Achieves substantial improvements over existing methods
Good Interpretability: Reduced models correspond to interpretable operational modes

Weaknesses

Limited Applicability: Primarily applies to parameterized MILP problems
Training Cost: Still requires substantial annotated data (optimal solutions)
Insufficient Theoretical Analysis: Lacks convergence and generalization ability guarantees
Limited Comparison Baselines: Primarily compares with one learning method (MLOPT)

Impact

Academic Contribution: Provides new research direction for ML4CO field
Practical Value: Direct application potential in industrial MILP solving
Reproducibility: Detailed method description and clear experimental setup
Inspirational Value: Preference learning ideas extensible to other optimization problems

Applicable Scenarios

Online Stochastic Planning: Requires fast solving of similar MILP instances
Supply Chain Optimization: Problems with fixed structure but varying parameters
Real-time Scheduling: Applications with strict solving time requirements
Large-scale Industrial Optimization: Problems difficult for commercial solvers

References

The paper cites important works in the field, including:

Bengio, Lodi, and Prouvost (2021): ML for combinatorial optimization survey
Bertsimas and Stellato (2022): Online mixed-integer optimization
Misra, Roald, and Ng (2022): Learning for constrained optimization
Numerous related works on end-to-end learning and learning to optimize methods

Overall Assessment: This is a high-quality research paper that proposes innovative preference learning methods in the ML4CO field. Despite certain limitations, its theoretical contributions and practical value are prominent, providing a new effective pathway for large-scale MILP solving.