- This event has passed.
Boris MUZELLEC (INRIA – SIERRA) – "Imputing missing data using regularized optimal transport"
Time: 14:00 pm – 15:00 pm
Date: 6th of January 2021
Boris MUZELLEC (INRIA – SIERRA) – “Imputing missing data using regularized optimal transport”
Missing data is a crucial issue when applying machine learning algorithms to real-world datasets. Indeed, even with a small fixed proportion of missing values, ignoring data points with missing values quickly becomes impracticable as the dimension increases. Therefore, it is necessary to elaborate strategies to replace missing values with reasonable guesses.
In this talk, we show how optimal transport (OT) tools can be used to impute data in a distribution-preserving way. We start with an introduction to the missing data problem and to regularized OT. We then show how OT can be used to turn a simple assumption – two batches extracted randomly from the same dataset should share the same distribution – into a loss function to impute missing data values. Finally, we present and demonstrate practical methods to minimize this loss, that can exploit or not parametric assumptions on the underlying distribution of values.