19 CREST Papers Accepted at NeurIPS 2025


This year again, CREST research team have made a strong mark at one of the world’s most prestigious conferences in artificial intelligence and machine learning: NeurIPS 2025.

A total of 19 papers by CREST members have been accepted, reflecting the lab’s growing influence in the global research community and the vitality of its teams in statistics, finance-insurance, and data science.

The accepted papers showcase the breadth and depth of the work carried out within the CREST.

Several contributions advance the fast-evolving field of diffusion and generative models, while others deepen the theoretical understanding of optimal transport and Wasserstein-based methods, approaches that connect CREST’s expertise in statistics and quantitative finance.

Other works explore kernel and inference methods, bandit algorithms, or privacy-preserving learning, illustrating the CREST’s commitment to developing both fundamental and responsible AI.

Beyond their diversity, these projects share a common spirit: a blend of mathematical rigor, computational innovation, and interdisciplinary collaboration. Together, they embody CREST’s ongoing mission to push the boundaries of what data-driven research can achieve, from theory to real-world impact.

Authors: Louis Allain, Sébastien Da Veiga, Brian Staber

Abstract: Conformal Prediction (CP) is a popular framework for constructing prediction bands with valid coverage in finite samples, while being free of any distributional assumption. A well-known limitation of conformal prediction is the lack of adaptivity, although several works introduced practically efficient alternate procedures. In this work, we build upon recent ideas that rely on recasting the CP problem as a statistical learning problem, directly targeting coverage and adaptivity. This statistical learning problem is based on reproducible kernel Hilbert spaces (RKHS) and kernel sum-of-squares (SoS) methods. First, we extend previous results with a general representer theorem and exhibit the dual formulation of the learning problem. Crucially, such dual formulation can be solved efficiently by accelerated gradient methods with several hundreds or thousands of samples, unlike previous strategies based on off-the-shelf semidefinite programming algorithms. Second, we introduce a new hyperparameter tuning strategy tailored specifically to target adaptivity through bounds on test-conditional coverage. This strategy, based on the Hilbert-Schmidt Independence Criterion (HSIC), is introduced here to tune kernel lengthscales in our framework, but has broader applicability since it could be used in any CP algorithm where the score function is learned. Finally, extensive experiments are conducted to show how our method compares to related work. All figures can be reproduced with the accompanying code.

Link: https://doi.org/10.48550/arXiv.2505.21039

Authors: Marguerite Petit–Talamon, Marc Lambert, anna Korba

Abstract:Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL) divergence. In this paper, we focus on the following parametric family: mixtures of isotropic Gaussians (i.e., with diagonal covariance matrices proportional to the identity) and uniform weights. We develop a variational framework and provide efficient algorithms suited for this family. In contrast with mixtures of Gaussian with generic covariance matrices, this choice presents a balance between accurate approximations of multimodal Bayesian posteriors, while being memory and computationally efficient. Our algorithms implement gradient descent on the location of the mixture components (the modes of the Gaussians), and either (an entropic) Mirror or Bures descent on their variance parameters. We illustrate the performance of our algorithms on numerical experiments.

Link: https://doi.org/10.48550/arXiv.2506.13613

Authors: Touqeer Ahmad, Mohammadreza Mousavi Kalan, François Portier, Gilles Stupfler

Abstract: Oversampling synthetic minority examples using and its variants is a leading strategy for addressing imbalanced classification problems. Despite the success of this approach in practice, its theoretical foundations remain underexplored. We develop a theoretical framework to analyze the behavior of and related methods when classifiers are trained on synthetic data. First, we establish an exponential inequality that characterizes the gap between the empirical risk computed on synthetic samples and the true population risk on the minority class. Second, we show that a kernel-based classification rule trained on synthetic data can achieve the minimax rate of convergence. This leads to practical guidelines for better parameter tuning of both and the downstream learning algorithm. Numerical experiments are provided to illustrate and support the theoretical findings.

Link: https://neurips.cc/virtual/2025/poster/117246

Authors: Vahan Arsenyan, Elen Vardanyan, Arnak Dalalyan

Abstract: Generative modeling aims to produce new random examples from an unknown target distribution, given access to a finite collection of examples. Among the leading approaches, denoising diffusion probabilistic models (DDPMs) construct such examples by mapping a Brownian motion via a diffusion process driven by an estimated score function. In this work, we first provide empirical evidence that DDPMs are robust to constant-variance noise in the score evaluations. We then establish finite-sample guarantees in Wasserstein-2 distance that exhibit two key features: (i) they characterize and quantify the robustness of DDPMs to noisy score estimates, and (ii) they achieve faster convergence rates than previously known results. Furthermore, we observe that the obtained rates match those known in the Gaussian case, implying their optimality.

Link: https://doi.org/10.48550/arXiv.2506.09681

Authors: Marius Potfer, Vianney Perchet

Abstract: Repeated multi-unit auctions, where a seller allocates multiple identical items over many rounds, are common mechanisms in electricity markets and treasury auctions. We compare the two predominant formats: uniform-price and discriminatory auctions, focusing on the perspective of a single bidder learning to bid against stochastic adversaries. We characterize the learning difficulty in each format, showing that the regret scales similarly for both auction formats under both full-information and bandit feedback, as and , respectively. However, analysis beyond worst-case regret reveals structural differences: uniform-price auctions may admit faster learning rates, with regret scaling as   in settings where discriminatory auctions remain at . Finally, we provide a specific analysis for auctions in which the other participants are symmetric and have unit-demand, and show that in these instances a similar regret rate separation appears.

Link: https://neurips.cc/virtual/2025/poster/115489

Authors: Georgios Gavrilopoulos, Guillaume Lecué, Zong Shang

Abstract: We obtain upper bounds for the estimation error of Kernel Ridge Regression (KRR) for all non-negative regularization parameters, offering a geometric perspective on various phenomena in KRR. As applications: 1. We address the multiple descent problem, unifying the proofs of arXiv:1908.10292 and arXiv:1904.12191 for polynomial kernels and we establish multiple descent for the upper bound of estimation error of KRR under sub-Gaussian design and non-asymptotic regimes. 2. For a sub-Gaussian design vector and for non-asymptotic scenario, we prove a one-sided isomorphic version of the Gaussian Equivalent Conjecture. 3. We offer a novel perspective on the linearization of kernel matrices of non-linear kernel, extending it to the power regime for polynomial kernels. 4. Our theory is applicable to data-dependent kernels, providing a convenient and accurate tool for the feature learning regime in deep learning theory. 5. Our theory extends the results in arXiv:2009.14286 under weak moment assumption.
Our proof is based on three mathematical tools developed in this paper that can be of independent interest: 1. Dvoretzky-Milman theorem for ellipsoids under (very) weak moment assumptions. 2. Restricted Isomorphic Property in Reproducing Kernel Hilbert Spaces with embedding index conditions. 3. A concentration inequality for finite-degree polynomial kernel functions.

Link: https://doi.org/10.48550/arXiv.2404.07709

Authors: Yvann Le Fay, Nicolas Chopin, Simon Barthelmé

Abstract: Variational inference consists in finding the best approximation of a target distribution within a certain family, where `best’ means (typically) smallest Kullback-Leiber divergence. We show that, when the approximation family is exponential, the best approximation is the solution of a fixed-point equation. We introduce LSVI (Least-Squares Variational Inference), a Monte Carlo variant of the corresponding fixed-point recursion, where each iteration boils down to ordinary least squares regression and does not require computing gradients. We show that LSVI is equivalent to stochastic mirror descent; we use this insight to derive convergence guarantees. We introduce various ideas to improve LSVI further when the approximation family is Gaussian, leading to a O(d³) complexity in the dimension d of the target in the full-covariance case, and a O(d)complexity in the mean-field case. We show that LSVI outperforms state-of-the-art methods in a range of examples, while remaining gradient-free, that is, it does not require computing gradients.

Link: https://doi.org/10.48550/arXiv.2502.18475

Authors: Luca Arnaboldi, Bruno Loureiro, Ludovic Stephan, Florent Krzakala, Lenka Zdeborova

Abstract: We study the dynamics of stochastic gradient descent (SGD) for a class of sequence models termed Sequence Single-Index (SSI) models, where the target depends on a single direction in input space applied to a sequence of tokens. This setting generalizes classical single-index models to the sequential domain, encompassing simplified one-layer attention architectures. We derive a closed-form expression for the population loss in terms of a pair of sufficient statistics capturing semantic and positional alignment, and characterize the induced high-dimensional SGD dynamics for these coordinates. Our analysis reveals two distinct training phases: escape from uninformative initialization and alignment with the target subspace, and demonstrates how the sequence length and positional encoding influence convergence speed and learning trajectories. These results provide a rigorous and interpretable foundation for understanding how sequential structure in data can be beneficial for learning with attention-based models.

Link: https://doi.org/10.48550/arXiv.2506.02651

Authors: Oussama Zekri, Nicolas Boullé

Abstract: Discrete diffusion models have recently gained significant attention due to their ability to process complex discrete structures for language modeling. However, fine-tuning these models with policy gradient methods, as is commonly done in Reinforcement Learning from Human Feedback (RLHF), remains a challenging task. We propose an efficient, broadly applicable, and theoretically justified policy gradient algorithm, called Score Entropy Policy Optimization (SEPO), for fine-tuning discrete diffusion models over non-differentiable rewards. Our numerical experiments across several discrete generative tasks demonstrate the scalability and efficiency of our method. Our code is available at this https URL.

Link: https://doi.org/10.48550/arXiv.2502.01384

Authors: Stanislas Strasman, Sobihan Surendran, Claire Boyer, Sylvain Le Corff, Vincent Lemaire, Antonio Ocello

Abstract: Score-based Generative Models (SGMs) have achieved impressive performance in data generation across a wide range of applications and benefit from strong theoretical guarantees. Recently, methods inspired by statistical mechanics, in particular Hamiltonian dynamics, have introduced Critically-damped Langevin Diffusions (CLD), which define diffusion processes in extended spaces by coupling the data with auxiliary variables. These approaches, along with their associated score-matching and sampling procedures, have been shown to outperform standard diffusion-based samplers numerically. In this paper, we propose an upper bound on the sampling error for CLD-based generative models in the Wasserstein metric. To better exploit the extended space, we also propose a modified dynamic that introduces an additional hyperparameter controlling the noise applied to the data coordinates. This hyperparameter influences the smoothness of sample paths, and our discretization error analysis offers practical guidance for tuning, leading to improved sampling performance.

Link: https://neurips.cc/virtual/2025/poster/117301

Authors: Nina Vesseron, Louis Béthune, Marco Cuturi

Abstract: The canonical approach in generative modeling is to split model fitting into two blocks: define first how to sample noise (e.g. Gaussian) and choose next what to do with it (e.g. using a single map or flows). We explore in this work an alternative route that ties sampling and mapping. We find inspiration in moment measures, a result that states that for any measure ρ, there exists a unique convex potential u such that ρ=ueu. While this does seem to tie effectively sampling (from log-concave distribution eu) and action (pushing particles through u), we observe on simple examples (e.g., Gaussians or 1D distributions) that this choice is ill-suited for practical tasks. We study an alternative factorization, where ρ is factorized as wew, where w is the convex conjugate of a convex potential w. We call this approach conjugate moment measures, and show far more intuitive results on these examples. Because w is the Monge map between the log-concave distribution ew and ρ, we rely on optimal transport solvers to propose an algorithm to recover w from samples of ρ, and parameterize w as an input-convex neural network. We also address the common sampling scenario in which the density of ρ is known only up to a normalizing constant, and propose an algorithm to learn w in this setting.

Link: https://doi.org/10.48550/arXiv.2503.10576

Author: Imad Aouali

Abstract: Efficient exploration is a key challenge in contextual bandits due to the large size of their action space, where uninformed exploration can result in computational and statistical inefficiencies. Fortunately, the rewards of actions are often correlated and this can be leveraged to explore them efficiently. In this work, we capture such correlations using pre-trained diffusion models; upon which we design diffusion Thompson sampling (dTS). Both theoretical and algorithmic foundations are developed for dTS, and empirical evaluation also shows its favorable performance.

Link: https://doi.org/10.48550/arXiv.2402.10028

Authors: Jung-Hun Kim, Milan Vojnovic, Min-hwan Oh

Abstract: We study the combinatorial semi-bandit problem where an agent selects a subset of base arms and receives individual feedback. While this generalizes the classical multi-armed bandit and has broad applicability, its scalability is limited by the high cost of combinatorial optimization, requiring oracle queries at *every* round. To tackle this, we propose oracle-efficient frameworks that significantly reduce oracle calls while maintaining tight regret guarantees. For worst-case linear rewards, our algorithms achieve regret using only oracle queries. We also propose covariance-adaptive algorithms that leverage noise structure for improved regret, and extend our approach to general (non-linear) rewards. Overall, our methods reduce oracle usage from linear to (doubly) logarithmic in time, with strong theoretical guarantees.

Link: https://neurips.cc/virtual/2025/poster/119751

Authors: Rémi Castera, Felipe Garrido, Patrick Loiseau, Simon Mauras, Mathieu Molina, Vianney Perchet

Abstract: We consider matroid allocation problems under opportunity fairness constraints: resources need to be allocated to a set of agents under matroid constraints (which includes classical problems such as bipartite matching). Agents are divided into C groups according to a sensitive attribute, and an allocation is opportunity-fair if each group receives the same share proportional to the maximum feasible allocation it could achieve in isolation. We study the Price of Fairness (PoF), i.e., the ratio between maximum size allocations and maximum size opportunity-fair allocations. We first provide a characterization of the PoF leveraging the underlying polymatroid structure of the allocation problem. Based on this characterization, we prove bounds on the PoF in various settings from fully adversarial (wort-case) to fully random. Notably, one of our main results considers an arbitrary matroid structure with agents randomly divided into groups. In this setting, we prove a PoF bound as a function of the size of the largest group. Our result implies that, as long as there is no dominant group (i.e., the largest group is not too large), opportunity fairness constraints do not induce any loss of social welfare (defined as the allocation size). Overall, our results give insights into which aspects of the problem’s structure affect the trade-off between opportunity fairness and social welfare.

Link: https://arxiv.org/pdf/2403.00397v2

Authors: Shiyun Lin, Simon Mauras, Nadav Merlis, Vianney Perchet

Abstract: We study the problem of matching markets with ties, where one side of the market does not necessarily have strict preferences over members at its other side. For example, workers do not always have strict preferences over jobs, students can give the same ranking for different schools and more. In particular, assume w.l.o.g. that workers’ preferences are determined by their utility from being matched to each job, which might admit ties. Notably, in contrast to classical two-sided markets with strict preferences, there is no longer a single stable matching that simultaneously maximizes the utility for all workers.
We aim to guarantee each worker the largest possible share from the utility in her best possible stable matching. We call the ratio between the worker’s best possible stable utility and its assigned utility the \emph{Optimal Stable Share} (OSS)-ratio. We first prove that distributions over stable matchings cannot guarantee an OSS-ratio that is sublinear in the number of workers. Instead, randomizing over possibly non-stable matchings, we show how to achieve a tight logarithmic OSS-ratio. Then, we analyze the case where the real utility is not necessarily known and can only be approximated. In particular, we provide an algorithm that guarantees a similar fraction of the utility compared to the best possible utility. Finally, we move to a bandit setting, where we select a matching at each round and only observe the utilities for matches we perform. We show how to utilize our results for approximate utilities to gracefully interpolate between problems without ties and problems with statistical ties (small suboptimality gaps).

Link: https://doi.org/10.48550/arXiv.2411.03270

Authors: Ziyad Benomar, Romain Cosson, Alexander Lindermayr, Jens Schöter

Abstract: In non-clairvoyant scheduling, the goal is to minimize the total job completion time without prior knowledge of individual job processing times. This classical online optimization problem has recently gained attention through the framework of learning-augmented algorithms. We introduce a natural setting in which the scheduler receives continuous feedback in the form of progress bars: estimates of the fraction of each job completed over time. We design new algorithms for both adversarial and stochastic progress bars and prove strong competitive bounds. Our results in the adversarial case surprisingly induce improved guarantees for learning-augmented scheduling with job size predictions. We also introduce a general method for combining scheduling algorithms, yielding further insights in scheduling with predictions. Finally, we propose a stochastic model of progress bars as a more optimistic alternative to conventional worst-case models, and present an asymptotically optimal scheduling algorithm in this setting.

Link: https://doi.org/10.48550/arXiv.2509.19662

Authors: Achraf Azize, Yulian Wu, Junya Honda, Francesco Orabona, Shinji Ito, Debabrota Basu

Abstract: As sequential learning algorithms are increasingly applied to real life, ensuring data privacy while maintaining their utilities emerges as a timely question. In this context, regret minimisation in stochastic bandits under ϵ-global Differential Privacy (DP) has been widely studied. Unlike bandits without DP, there is a significant gap between the best-known regret lower and upper bound in this setting, though they “match” in order. Thus, we revisit the regret lower and upper bounds of ϵ-global DP algorithms for Bernoulli bandits and improve both. First, we prove a tighter regret lower bound involving a novel information-theoretic quantity characterising the hardness of ϵ-global DP in stochastic bandits. Our lower bound strictly improves on the existing ones across all ϵ values. Then, we choose two asymptotically optimal bandit algorithms, i.e. DP-KLUCB and DP-IMED, and propose their DP versions using a unified blueprint, i.e., (a) running in arm-dependent phases, and (b) adding Laplace noise to achieve privacy. For Bernoulli bandits, we analyse the regrets of these algorithms and show that their regrets asymptotically match our lower bound up to a constant arbitrary close to 1. This refutes the conjecture that forgetting past rewards is necessary to design optimal bandit algorithms under global DP. At the core of our algorithms lies a new concentration inequality for sums of Bernoulli variables under Laplace mechanism, which is a new DP version of the Chernoff bound. This result is universally useful as the DP literature commonly treats the concentrations of Laplace noise and random variables separately, while we couple them to yield a tighter bound.

Link: https://doi.org/10.48550/arXiv.2505.05613

Authors: Marc Jourdan, Achraf Azize

Abstract: Best Arm Identification (BAI) algorithms are deployed in data-sensitive applications, such as adaptive clinical trials or user studies. Driven by the privacy concerns of these applications, we study the problem of fixed-confidence BAI under global Differential Privacy (DP) for Bernoulli distributions. While numerous asymptotically optimal BAI algorithms exist in the non-private setting, a significant gap remains between the best lower and upper bounds in the global DP setting. This work reduces this gap to a small multiplicative constant, for any privacy budget . First, we provide a tighter lower bound on the expected sample complexity of any δ-correct and -global DP strategy. Our lower bound replaces the Kullback–Leibler (KL) divergence in the transportation cost used by the non-private characteristic time with a new information-theoretic quantity that optimally trades off between the KL divergence and the Total Variation distance scaled by . Second, we introduce a stopping rule based on these transportation costs and a private estimator of the means computed using an arm-dependent geometric batching. En route to proving the correctness of our stopping rule, we derive concentration results of independent interest for the Laplace distribution and for the sum of Bernoulli and Laplace distributions. Third, we propose a Top Two sampling rule based on these transportation costs. For any budget , we show an asymptotic upper bound on its expected sample complexity that matches our lower bound to a multiplicative constant smaller than 8. Our algorithm outperforms existing δ-correct and -global DP BAI algorithms for different values of .

Link: https://openreview.net/forum?id=IFso8G8gwJ

Authors: Adrien Vacher, Omar Chehab, Anna Korba

Abstract: Sampling from multi-modal distributions is challenging, even in low dimensions. We provide the first sampling algorithm for a broad class of distributions — including all Gaussian mixtures — with a query complexity that is polynomial in the parameters governing multi-modality, assuming fixed dimension. Our sampling algorithm simulates a time-reversed diffusion process, using a specific Monte Carlo estimator of the intermediate score functions. Unlike previous works, it avoids metastability, requires no prior knowledge of the mode locations, and does not rely on restrictive smoothness assumptions that exclude general Gaussian mixtures. We illustrate this result on a low-dimensional but challenging multi-modal sampling task, where our algorithm significantly outperforms existing approaches.

Link: https://neurips.cc/virtual/2025/poster/119085