SAGMOS – Statistical Analysis of Generative Models

Generative modeling, the automatic generation of examples such as texts, images, music, and molecules that are similar to those in a given dataset, is a central task in artificial intelligence. Mathematically, this task is framed as the problem of sampling from an unknown distribution, which is accessible only through a limited set of examples drawn to it. The size and quality of this set can vary greatly depending on the application. The algorithms that have propelled generative modeling to fame are known for their substantial data and computational resource requirements, often necessitating vast amounts of both to achieve state-of-the-art performance.

The goal of this project is to investigate the mathematical properties of generative modeling algorithms to better understand their strengths and weaknesses, enhance their efficiency, and design new methods. The mathematical challenge in generative modeling lies in successfully integrating techniques from various areas of mathematical statistics and probability theory: dimension reduction, nonparametric estimation, manifold learning, sampling, optimal transport, stochastic calculus, etc. Investigating the mathematical properties of this pipeline requires a deep analysis of these methods and their interactions to solve the overarching problem. Such analysis is key to exploring multiple facets of generative modeling algorithms, including precision, robustness, creativity, and computational traceability.

Our focus will be on obtaining interpretable statistical guarantees that highlight the impact of sample size, intrinsic and ambient dimensions, noise level, and contamination rate on precision, creativity, and running time. These guarantees are essential in AI to ensure the reliability of the resulting algorithms and enhance their trustworthiness, explainability, and frugality. We will pay special attention to stability and robustness properties, particularly against model misspecification, noise, and outliers.

Funded by the European Union (ERC, SAGMOS, 101201229). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.

ontact

Arnak Dalalyan – Principal investigator

Arnak Dalalyan is a Researcher in Statistics at CREST-Groupe ENSAE-ENSAI and Professor at ENSAE Paris.

His work lies at the crossroads of statistics, machine learning, and optimization, with a strong emphasis on the theoretical and methodological foundations of data science.

Personal website

Publications

Working papers

Articles

Citations

Podcasts

Schedule