Researcher portrait: Anna Korba, assistant professor at CREST-ENSAE Paris.
What is your career path?
I pursued a three-year program in Math/Data Science at ENSAE, concurrently completing a specialized Master’s in Machine Learning at ENS Cachan. My academic journey continued with a Ph.D. in Machine Learning at Télécom ParisTech under the supervision of Stephan Clémençon.
Afterward, I gained valuable experience as a postdoctoral researcher at the Gatsby Computational Neuroscience Unit, University College London, collaborating with Arthur Gretton.
In 2020, I returned to ENSAE, joining the Statistics Department as an Assistant Professor. This trajectory has equipped me with a strong foundation in both Machine Learning and Statistics.
Did you have a statistician who particularly inspired you? If so, what were their research topics?
While I don’t have a single statistician who profoundly influenced me, I draw inspiration from the excellent mathematics taught by instructors like Arnak Dalalyan, Nicolas Chopin, Cristina Butucea and others at ENSAE. Also, I remember very well my first international conference in Machine Learning (ICML 2015 in Lille). Attending talks within the Deep Learning community, though somewhat distant from my research focus at the time, left a lasting impression. Witnessing the rapid and substantial advancements, particularly in areas like question answering, fascinated me. Conferences I attended provided exposure to influential figures—from esteemed senior professors to brilliant Ph.D. students—enriching my perspective on various statistics and machine learning subjects.
How did you get into statistics and Machine Learning in particular?
As a student I liked mathematics and coding. At ENSAE, I had the choice between quantitative finance and machine learning. With quantitative finance hiring slowing down, I embraced the rising tide of machine learning, drawn to its dynamic nature and innovative potential.
What are your research topics?
One of my primary research focuses is on sampling—approximating a target probability distribution when only partial information is available, such as its unnormalized density or samples. This versatile problem holds applications in various areas of machine learning.
In Bayesian inference, I address the posterior probability distribution over model parameters, particularly in supervised learning scenarios like determining the weights of linear or neural network regressors. Additionally, in generative modeling, my work involves learning the underlying process from a set of samples, such as true faces from celebrities, with the goal of generating new faces.
Beyond sampling, I’ve contributed to research in preference learning, structured prediction, and causality.
The framework of your field of research is fairly recent, and brings together different communities. Could you name them and explain how this collaborative effervescence has enabled a great advance?
My research intersects various communities, including experts in MCMC (Markov Chain Monte Carlo) methods, partial differential equations, dynamical systems, optimal transport (OT), and machine learning. In recent years, these traditionally independent fields have converged, fostering collaborative efforts.
A significant milestone in this convergence was a semester at Berkeley, organized by P. Rigollet, S. Di Marino, K. Craig, and A. Wilson, which brought together researchers from these diverse areas. Since then, the boundaries between these communities have become more fluid, sparking heightened interest and collaboration.
For example, I co-presented a tutorial on Wasserstein gradient flows with Adil Salim at ICML 2022, while Marco Cuturi and Charlotte Bunne presented a tutorial on OT, control, and dynamical systems at ICML 2023. These tutorials aim to introduce promising research directions and tools, providing a comprehensive panorama to a broad audience of machine learning researchers.
This collaborative effervescence has resulted in exciting progress on both theoretical and computational fronts. Researchers with expertise in multiple domains are leveraging their backgrounds to overcome challenges, offering convergence guarantees for numerical schemes and addressing practical limitations in sampling schemes, such as convergence time and local minima.
There are still many unsolved problems in the various applications. What would you like to solve or advance in your future research?
While significant strides have been made in sampling techniques inspired by optimization literature, there are still numerous unexplored aspects. My current research focus involves the incorporation of constraints into sampling methodologies. For instance, I am exploring ways to ensure fairness in predictive models by constraining the posterior distribution, making predictions independent of sensitive attributes like gender. In the realm of generative modeling, it is interesting to incorporate constraints or rewards as well, e.g. to generate images that satisfy some criterion such as brightness.
How is the intersection of fair analysis methods and Bayesian statistical methods an important advance for Machine Learning?
Bayesian inference, by providing a posterior distribution over the parameters of a model, allows for predictions with uncertainty. This is pivotal in applications where users require models capable of predicting with uncertainty, as the distribution over predictions provides a more comprehensive understanding than pointwise predictions alone. Moreover, incorporating fairness constraints in Bayesian methods holds important applications, ensuring that predictions are not influenced by sensitive attributes. This intersection enhances the interpretability and ethical considerations of machine learning models.