ICML 2026

Diffusion-based Learning Framework for Constrained Nonconvex Optimization with Weighted Bootstrapped Refinement

Shutong Ding Yimiao Zhou Ke Hu Xi Yao Junchi Yan Xiaoying Tang Ye Shi
ShanghaiTech University · MoE Key Laboratory of Intelligent Perception and Human Machine Collaboration · Shanghai Jiao Tong University · China Mobile Research Institute · The Chinese University of Hong Kong, Shenzhen
TL;DR

Supervised diffusion solvers can learn diverse candidate solutions, but their generated distribution may barely overlap the feasible set in constrained high-dimensional problems. DiOpt adds a self-bootstrapped refinement stage, reweighting generated candidates by feasibility and optimality so the solver learns where feasible, high-quality solutions live.

Abstract

Recent advances in diffusion models show promising potential to accelerate nonconvex problem solving by leveraging multimodal generation. However, most diffusion-based optimization methods rely on supervised learning and lack a mechanism to enforce constraint satisfaction, which is essential in real-world applications. DiOpt addresses this distributional misalignment with a two-phase framework: a supervised warm-start followed by weighted bootstrapped refinement. During refinement, sampled candidates are scored by constraint violation and objective quality, stabilized with a look-up table of historically strong solutions, and used to iteratively update the diffusion solver. At inference time, DiOpt samples multiple candidates in parallel and selects the best solution according to the same optimization-aware score.

Why Feasibility Is Hard

Learning-based optimizers predict solutions directly, but a good objective value is not enough when constraints are strict. In high-dimensional constrained spaces, even a distribution centered near an optimum can place little probability mass inside the feasible region, causing low feasibility despite apparently strong objective performance.

Geometric illustration of distributional misalignment between generated samples and feasible regions.

The feasible region and model-induced distribution can have limited overlap, especially as the number of constraints grows.

Method

1. Supervised Warm-start

Train the diffusion solver from available optimal or high-quality solutions so it starts near useful regions of the search space.

2. Weighted Bootstrapping

Sample candidates from the current solver and assign weights that reward feasibility and objective quality instead of copying labels blindly.

3. Solution Selection

Draw multiple candidates at inference and select the best one, preserving the parallel sampling advantage of diffusion models.

DiOpt pipeline showing supervised learning, bootstrapped refinement, lookup table update, and solution selection.

DiOpt alternates generation, optimization-aware weighting, and self-training to move the learned distribution toward feasible near-optimal solutions.

Toy Example

A supervised diffusion solver may concentrate around low-objective regions that violate constraints. After DiOpt refinement, generated samples shift toward the feasible region while retaining solution diversity.

Supervised Diffusion

Toy problem results from supervised diffusion.

DiOpt

Toy problem results from DiOpt.

Training Dynamics

When bootstrapping starts, DiOpt sharply reduces constraint violations. Objective-related metrics can briefly rise while the sampler moves inward to the feasible region, then settle back toward near-optimal values.

Training dynamics for DiOpt on ACOPF118 and QPSR tasks.

Constraint satisfaction improves rapidly after the bootstrapping stage begins, while optimality recovers through subsequent refinement.

Results

81.87% QPSR feasibility with DiOpt
69.95% CQP feasibility where RectFlow and DC3 fail
100% Motion retargeting feasibility
84.33% ACOPF118 feasibility
Task DiOpt Feasibility DiOpt Gap Takeaway
QPSR 81.87% 2.48% Strong feasibility recovery over supervised RectFlow.
CQP 69.95% 7.04% Maintains valid solutions on a difficult nonconvex benchmark.
Retargeting 100.00% 0.65% Matches full feasibility with lower gap than baselines.
ACOPF57 93.33% 0.24% Competitive with strong feasibility and near-solver objective.
ACOPF118 84.33% 2.26% Substantially improves feasibility on a harder power-grid task.

Additional Comparisons

Comparison figures for DiOpt and baseline methods.

Across benchmark families, DiOpt is designed to balance objective quality with constraint satisfaction, rather than optimizing one at the expense of the other.

BibTeX

@inproceedings{ding2026diopt,
  title     = {Diffusion-based Learning Framework for Constrained Nonconvex Optimization with Weighted Bootstrapped Refinement},
  author    = {Ding, Shutong and Zhou, Yimiao and Hu, Ke and Yao, Xi and Yan, Junchi and Tang, Xiaoying and Shi, Ye},
  booktitle = {International Conference on Machine Learning},
  year      = {2026}
}