Giovanni RIZZI (Toulouse School of Economics) “Opening the Black Box: A Statistical Theory of the Value of Data””
Time : 12h15- 13h30
Date : 27 th January 2026
Salle 3001
Giovanni RIZZI (Toulouse School of Economics) “Opening the Black Box: A Statistical Theory of the Value of Data”
Abstract: This paper develops a theory of the value of data for prediction. An agent chooses a sample of individuals and a subset of their observable characteristics (covariates) to estimate the parameters of a data-generating process and predict an outcome for a target individual based on her characteristics. I distinguish between covariates collected on the sample (training data) and covariates collected on the target individual (prediction data). The main findings are: (i) training covariates exhibit economies of scope, as the value of one covariate is higher when others are also observed; (ii) the value of an additional training covariate is inverted-U-shaped in the sample size, so training covariates and observations are complements when data are scarce but become substitutes when data are abundant; and (iii) the value of a prediction covariate for the target individual is strictly increasing in the sample size and the number of training covariates. These findings have three policy implications. Mergers between firms holding different covariates can be privately profitable yet reduce welfare, especially when data are scarce (e.g., under strict privacy rules). Allowing firms to sell covariate bundles is always procompetitive because it removes double marginalization, whereas bundling observations can be anticompetitive when data are abundant. Finally, a data seller may profitably exclude one of several competing prediction providers even when this lowers total welfare.