The French National Research Agency (ANR)
The French National Research Agency (ANR) is a public administrative institution under the authority of the French Ministry of Higher Education, Research and Innovation. The agency funds project-based research carried out by public operators cooperating with each other or with private companies.
The Agency’s missions, defined in the decree of 1 August 2006 amended on 24 March 2014, are:
- To fund and promote the development of basic and targeted research, technological innovation, technology transfer and public-private partnerships
- To implement the Work Programme approved by the French Minister of Research, following consultation with the supervisory ministers of France’s research bodies and public higher education institutions
- To manage major government investment programmes in the fields of higher education and research and to oversee their implementation
- To strengthen scientific cooperation across Europe and worldwide by aligning its Work Programme with European and international initiatives
- To analyse trends in research offering and assess the impact of the funding it allocates on scientific output in France
ANR Fundings
Explore the flourishing realm of research at the CREST laboratory, propelled by the crucial financial support of the National Research Agency (ANR). As a major player in the French research landscape, the ANR plays a pivotal role in funding a variety of innovative projects within CREST. These projects, a testament to the creativity and insight of our researchers, push the boundaries of knowledge in diverse fields. Dive into our directory of ANR-supported projects and discover how strategic funding can catalyze scientific breakthroughs that enrich our understanding of the world around us.
List of CREST’s ongoing ANR projects
Author(s)
Vianney Perchet
Year
2024/2028
Submission summary
Future markets are going to be online, wide (even international) and the pairing between offer and demand is going to be facilitated, organized, and regularized by some platforms, obviously learning from gathered data, past errors, and successes, in some automatic fashion. Artificial intelligence will certainly be their core technique. Formally, the objective is to match supply to demand agents sequentially (and even if some delay is sometimes admissible, irrevocable decisions must be made), typically at some cost (that depends on both sides). On the other hand, this generates utilities, for both the supply and demand agent and some revenue for the platform. Reconciling those contradictory objectives is quite difficult and challenging; it relies on bargaining, auctions or any other mechanisms. Finally, as these markets might involve real persons, with their own interests, whose lives can be strongly impacted by AI decisions; it is crucial to take this aspect into account, for obvious ethical questions, but also because the data-generating process is quite intricate, as it is the consequence of many multi-agent interactions.
The global, far-reaching objective of project DOOM is to design new theoretical AI approaches for admissible online matching market platforms, i.e., that are efficient, but remains respectful of its participating agents. We will develop and analyze algorithms that will answer the following questions. Who should be matched to who, and when? What are the best mechanisms for incentive agents and/or price transactions? What data are going to be used and how?
Author(s)
Paola Tubaro
Year
2024/2028
Submission summary
Digital labour platforms use data and algorithms to match clients with workers, construed as independent contractors, for one-off ‘gigs’ without any long-term commitment. Building on scattered, but growing evidence that gender, race and other gaps persist in these settings, the proposed project addresses unresolved questions both in the digital inequalities literature and in the digital labour literature. To do so, VOLI innovatively combines hypotheses and methods from sociology and large-scale corpus linguistics, and relies on speech technology and artificial intelligence to tackle the emergent economic and societal risks that coalesce around the nexus between online platform labour and social inequalities. At the same time, the methods that will be developed within this highly interdisciplinary project advance the linguistic study of the factors driving speech variation, augmenting language corpora with rich sets of metadata from sociological surveys, while also building and testing new and improved tools for automated transcription.
Author(s)
Philippe Choné
Year
2023/2028
Submission summary
The project is fourfold: it intends to (i) build a platform that can manage massive health data and make them usable for researchers; (ii) use the tools of graph theory to describe the healthcare system in a systematic and quantitative way; (iii) develop new machine learning tools to understand the shape of the graphs and predict their consequences on health outcomes; (iv) shed light on those policy issues that affect the efficiency of the French healthcare system.
We have access to all records of consultations and other medical procedures, drug prescriptions, and hospital admissions for the entire population living in France. The data cover the years 2008 to 2018. We will represent this unique data set as a series of time-evolving, geolocated, and bipartite graphs. Such graphs have two types of nodes: patients and a category of providers (e.g., generalist doctors). A patient and a provider are connected if they have met at least once during the current year. The projection of bipartite graphs on the set of providers informs about how patient sharing and referral networks.
We will develop econometric and machine learning methods to explain and/or predict the matching between patients and providers based on patients’ and providers’ characteristics (location, health condition, physician specialty, etc). Our two main goals are to understand the formation of the graphs and use these graphs to estimate the causal impact of the healthcare system on utilization and health outcomes (drug prescription, emergency hospital visits, mortality, etc.). We will examine whether certain local configurations are more effective at delivering better outcomes for patients. A particular attention will be paid to the geographic distribution of healthcare supply. We will build indicators of potential access at the local level, characterize potential low-density areas, the so-called “medical deserts”, and quantify their effects on patients’ outcomes.
Author(s)
Julien Prat
Year
2023/2025
Submission summary
Public blockchains, such as Bitcoin and Ethereum, are publicly accessible by design, but their data cannot be easily accessed and analysed without proper structure and indexing. The objective of this project is to develop a publicly accessible infrastructure that enables easy access and searchability of blockchain data in accordance with the FAIR principles (Findable Accessible Interoperable Replicable) of open science. This will promote complete transparency and reproducibility of scientific analysis results in the blockchain field – something that does not exist today – facilitating the growth of new and existing applications and collaborations.
At present, structured analyses are typically performed using proprietary solutions and databases, which make reproducibility and sharing of data within the scientific community challenging and expensive. Additionally, even though scientific studies often perform common operations to collect data systematically, the tools and libraries developed and used are rarely shared with the broader scientific community. As a result, similar research carried out by different institutions and groups often requires the re-implementation of the same software tools, leading to wasted resources and an inability to reproduce and compare results.
As part of this project, we plan to provide the scientific community with:
(a) Publicly accessible and expandable datasets and infrastructure that include structured, daily updated blockchain transaction data. Researchers will be able to access raw transaction data and community-maintained, enriched datasets in a uniform and open manner, promoting the availability and reuse of these complex data.
(b) An open-source software framework and standardized data access APIs that enable effective querying, annotating, and referencing of data, and building well-described reusable workflows and pipelines that will facilitate the exchange and replication of scientific results according to the FAIR principles of open science.
This project also aims to provide an effective solution and tool for the European Commission to certify blockchain transactions as required by the recently voted (08/12/2022) European tax rules, also known as the eighth Directive on Administrative Cooperation (DAC8), and for which a public and generally accepted solution does not yet exist.
According to Google Scholar, over 547,000 scientific articles have been published using the keyword “blockchain” and 13,000 using the combination “blockchain data” since 2013. These numbers are growing rapidly in recent years. However, only 200 c.a. publicly accessible datasets associated with these studies have been identified, and the quality and reliability of these reference data collections are generally unknown. This pattern highlights the pressing need for an open and reusable research reference database and software solution in the rapidly growing field of blockchain networks and data analytics. In other research fields such as healthcare, bioscience, particle physics, geoscience, and astrophysics, there are publicly accessible and maintained methods, open-source software tools, and databases (e.g., UK Biobank, UniProt, CERN Open Data, ESA Gaia Archive, NASA planetary data system) where researchers can collect and share their data. However, this is not currently the case in the field of blockchain data: for this reason, the objective of this project is to fill this gap and provide an effective and public accepted solution to this pressing need.
Author(s)
Bertrand Garbinti
Year
2023/2026
Submission summary
While wealth and income inequality have been on the rise in many countries over the past decades, European welfare states have come under pressure in a context of persistent economic difficulties and increasing globalization. By adopting a Franco-German perspective, EQUITAX aims to contribute to this issue by shedding light onto the most effective tax and transfer instruments to fight income and wealth inequality. To achieve this aim, EQUITAX will pioneer a comprehensive analysis on the link between taxation and inequality, combining macroeconomic, theoretical, and applied microeconometric approaches.
The first working package will carry out an in-depth comparative analysis of the dynamics of income and wealth inequalities in France and in Germany. We will start by constructing original long-term series of pre-tax and post-tax income inequality for Germany. Then, we will combine these series with the existing French series to investigate how the respective national tax-and transfer systems have succeeded in reducing inequality, and how common and country-specific public policies—such as political, social, and fiscal institutions—affect the historical evolution of income and wealth inequality.
The second working package will examine the extent to which rich households modify their behavior in response to tax reforms. In particular, we will exploit the combination of excellent administrative tax data from France and Germany and sharp changes, which constitute real “natural experiments,” for the taxation of wealth in France and for the taxation of bequests in Germany. This will allow us to measure reliable estimates of key inputs for the scientific community and policymakers: What is the impact of wealth taxation on the decision to save, work, emigrate, or use optimization strategies minimizing taxable capital income? To what extent do top wealth holders adjust the timing of inter vivos gifts in order to avoid bequest taxation?
The third working package will explore what the main inefficiencies of actual tax policies are and what “optimal” systems may look like by combining the inputs from the previous working packages with taxation theory, both from the micro and macro perspectives. We will first compute the marginal deadweight loss of taxation along the income distribution of each country in order to point out parts of the distribution on which actual tax systems would be inefficient. We will then use a macro-simulation model to implement a welfare analysis of the impact of different tax reforms in France as well as in Germany. This analysis will provide a guide for the discussion on tax policy reforms.
By identifying various channels and mechanisms on inequality, taxation, and redistribution, EQUITAX will help build better governance. These new insights will provide a reliable source of information for the public debate and serve policymakers to design evidence-based policies for a fair, united, and peaceful Europe.
Author(s)
Xavier D’Haultfoeuille
Year
2023/2028
Submission summary
Empirical research in social sciences is plagued by significant uncertainty. Accounting for it correctly, in particular through appropriate confidence intervals, is key to avoid, e.g., false discoveries. Very often, these intervals are constructed by assuming independence between observations. However, this assumption is unrealistic with multi-indexed data, as in international trade: exports from China to the United States are probably correlated with exports from Germany to China, for instance, because China is common to both variables. Similarly, wages of two workers belonging to the same sector or to the same geographical area are likely correlated, because of sectoral or area-specific shocks on labour demand. The Ricode project aims at improving existing inference practices, which have important shortcomings, in these cases of so-called exchangeable data. We expect to establish new theoretical results, develop corresponding R and Stata programs, and further investigate key fields of application. On the theoretical side, we will propose an analytical inference method that is robust to so-called “degenerate” situations where the limiting distribution is no longer Gaussian in general. We will also study the properties of bootstrap methods adapted to this type of dependence. Finally, we will develop methods that are potentially valid in finite samples, using permutation tests. On the application side, we will consider international trade models, which include high-dimensional fixed effects and for which no inference theory has been developed yet. We will also revisit the estimation of demand in empirical industrial organization, where data are exchangeable in some dimensions (such as goods or producers) but not in the time dimension, which is also a key aspect of dependence. Finally, we will revisit the empirical literature to quantify the importance of using appropriate inference methods.
Author(s)
Elia Lapenta
Year
2023/2026
Submission summary
Regressions with Instrumental Variables (IVs) play a central role in applied econometrics. They are employed to recover causal effects and to estimate structural models obtained from economic theory.
However, the reliability of estimates from IV regressions is often limited by the strong parametric restrictions imposed on the functions of interest, e.g. linearity assumptions. While these restrictions simplify the estimation procedure, they can rarely be justified from an economic perspective. Hence, they bring along the risk of misspecification: when the true regression function of interest does not follow the parametric model, the estimates are biased and the counterfactual analysis obtained from such models is misleading.
More flexible (nonparametric) estimation method for IV regressions have been proposed in the literature, but they are often difficult to implement, as they require running multiple nonparametric regressions and selecting multiple regularization parameters. Furthermore, they are computationally prohibitive in the presence of large datasets. In an era of big data, it is of utmost importance to rely on easy-to-implement econometric tools which are designed to handle large datasets while coming with strong theoretical guarantees.
In this project, we rely on the theory of Reproducing Kernel Hilbert Spaces, popular in machine learning, to develop estimation techniques for IV nonparametric regressions that (i) are easy to implement and compute, (ii) do not require selecting multiple regularization parameters, and (iii) avoid running multiple nonparametric regressions.
We have three specific objectives. The first is to derive the rates of convergence of the proposed estimators. Our second aim is to develop valid inference procedures for both the nonparametric regression function or some of its functional of interest. Finally, we want to develop R packages implementing our proposed techniques to facilitate their use in empirical research.
Author(s)
Etienne Ollion
Year
2023/2026
Submission summary
The Pantagruel project is an ambitious initiative that aims to develop and evaluate multimodal (written, spoken, pictograms) and inclusive linguistic models for French. The project draws on the expertise of researchers from different disciplines, including computer science, signal processing, sociology, and linguistics, to ensure diversity of perspectives, as well as the reliability and relevance of results. The main contributions of the project are the development of freely available self-supervised models for French, including one to three of the modalities for the general and clinical domains. The project will not only produce models but also design test benches to evaluate the generalization of such models, building on the experience gained in the FlauBERT and LeBenchmark projects. Part of the project will be devoted to the biases and stereotypes conveyed in the training corpora and in the downstream models. An ethics committee will help limit the amplification effect of bias within the training corpora, in particular by working on the demographic characteristics of the speakers (for audio or transcribed speech) and of the authors (for part of the written data). We will thus be able to compare the models learned on training corpora with variable proportions for these characteristics, such as gender. This study will quantify to what extent the predictions of the language models are reliable reflections of the upstream corpora and to better control the way in which they can be used as social scientific research tools. The project will develop software components that will facilitate the integration of language models into various applications and allow the development of innovative solutions that exploit the power of multimodal French language models. These tools are particularly intended for non-computer scientists such as those who are members of the consortium (sociologists, linguists, doctors, speech therapists), researchers from other fields, or artists. The Pantagruel project thus has the potential to significantly advance the state of the art in multimodal language models and to have disseminate the use of these models in a wide range of applied fields, ranging from healthcare to the humanities and the social sciences.
Author(s)
Geoffrey Barrows
Year
2022/2026
Submission summary
A large body of research indicates that air pollution affects human health and productivity (see Graff Zivin & Neidell (2013) for review). If these pollution-induced health shocks adversely affect labor productivity, then standard microeconomic theory suggests they should increase costs and prices, and lower gross output, revenues, profits, wages, consumer surplus, and total social welfare, depending on structural elements of supply and demand. While there exists a nascent literature that aims to connect pollution-induced productivity shocks to economic outcomes (see for example Fu et al., 2017 and Dechezlepre?tre et al., 2019), there is still very little work on the effect of these shocks on the operations of firms. To what extent do firms’ costs depend on local air pollution concentrations? Do firms pass the cost of pollution shocks on to consumers and workers? And if so, do these costs cross national boundaries? The goal of PRODPOLU is to link geo-localized data on French manufacturing plants together with detailed information on plant and firm-level outcomes and high spatial resolution pollution data to study for the first time the productivity, price, wage, and output effects of air pollution in a unified empirical framework.
Author(s)
Anna Korba
Year
2022/2025
Submission summary
An important problem in machine learning and computational statistics is to sample from an intractable target distribution. In Bayesian inference for instance, the latter corresponds to the posterior distribution of the parameters, which is known only up to an intractable normalisation constant, and is needed for predictive inference. In deep learning, optimizing the parameters of a big neural network can be seen as the search for an optimal distribution over the parameters of the network.
This sampling problem can be cast as the optimization of a dissimilarity (the loss) functional, over the space of probability measures. As in optimization, a natural idea is to start from an initial distribution and apply a descent scheme for this problem. In particular, one can leverage the geometry of Optimal transport and consider Wasserstein gradient flows, that find continuous path of probability distributions decreasing the loss functional. Different algorithms to approximate the target distribution result from the choice of a loss functional, a time and space discretization; and results in practice to the simulation of interacting particle systems. This optimization point of view has recently led to new algorithms for sampling, but has also shed light on the analysis of existing schemes in Bayesian inference or neural networks optimization.
However, many theoretical and practical aspects of these approaches remain unclear. First, their non asymptotic properties quantifying the quality of the approximate distribution at a finite time and for a finite number of particles. Second, their convergence in the case where the target is not log-concave (which is analog to the non-convex optimization setting). Motivated by the machine learning applications mentioned above, the goal of this project is to investigate these questions, by leveraging recent techniques from the optimization, optimal transport, and partial differential equations literature.
Author(s)
Etienne Ollion
Year
2021/2025
Submission summary
In a context of destabilization of the public space linked to digital technology, the MEDIALEX project aims to renew the understanding of the dynamics of influence between parliamentary, media and public agendas. Its main objective is to better understand how parliamentarians, the media and the public influence each other in the definition of priority topics for public debate. To achieve this, the project intends to develop new computational methods to track statements in different layers of the public space. By bringing together sociologists, political scientists, computer scientists and computational linguistics researchers, this interdisciplinary project aims to (1) understand the dynamics of influence between the agendas of parliamentarians, the media and the public, (2) develop original methods to identify media events and reported utterances in large heterogeneous corpora, and (3) study the effects of the digitization of the public space on the legislator’s ability to impose the topics of public discussion.
The project mobilizes methods from the computational social sciences, taking advantage of new analytical frameworks that reconcile social science approaches with new computational tools. The methodological challenge of the project consists in developing methods from automatic language processing, in order to allow the identification of themes that cross the public debate in a more refined way and on a larger scale. These methods concern the identification of events on Twitter, on the one hand, and the identification of discourses reported in voluminous and heterogeneous corpora (newspapers, television, radio, parliamentary questions and debates, Twitter and Facebook).
The scientific program of MEDIALEX is divided into four work packages. The first one gathers the tasks of coordination, corpus management and dissemination (WP1). The three other work packages explore the influence of parliamentary, media and public agendas in three complementary ways. WP2 considers influence in a structural way, aiming at identifying over the long term which large category of actors (parliamentarians, media, public) manages to impose priority topics of attention on the others. WP3 considers influence at a finer scale, by studying the mechanisms of circulation of discourses between parliamentary, media and public spaces. Finally, WP4 focuses on the interpretations that are produced by the media and the public from parliamentary work (WP4).
The vocation of MEDIALEX is essentially scientific, but given its object and the techniques implemented, the project aims to contribute to public debate. Through “datasprints” and “workshops”, the project intends to involve actors and experts around new ways of representing the public space and political activity. MEDIALEX’s approach is thus in line with the major concerns of our societies regarding the role of Parliament, the production of public policies, the role of the media and the renewal of the forms of democracy.
Author(s)
Anna Simoni
Year
2021/2025
Submission summary
Economic models for policy evaluation and labor markets often imply restrictions on observable and unobservable quantities, and on a structural parameter that are written in the form of moment conditions (MCs). The structural parameter of the MC model has a causal interpretation and the social planner wants to know its value in order to decide the policy to undertake. This project develops Bayesian causal inference and prediction for this type of conditional and unconditional MC models by making minimal assumptions on the data distribution. Our procedure is based on the Exponential Tilted Empirical Likelihood and we will show it is valid for both Bayesian and frequentist inference. Estimating causal effects is important in socio-economic situations of scarce resources in order to know the best treatment that has to be administrated to achieve a given goal. In addition to theoretical econometric tools we will provide the computational tools to easily implement our procedure.
Author(s)
Caroline Hillairet
Year
2021/2025
Submission summary
In presence of abrupt (financial crisis or epidemics) or long-term (environmental or demographic) changes, one needs to use dynamic tools, to detect such changes from observable data, and to re-estimate models and risk quantification parameters, based on a dynamic and long-term view. Classical decision theory relies on a backward approach with given deterministic utility criteria. The drawbacks are twofold: first it does not incorporate any changes in the agents’ preferences, or any uncertain evolution of the environment variables. Furthermore, it leads to time-inconsistency and to optimal choices that depend on a fixed time-horizon related to the optimization problem. The framework of dynamic utilities is adapted to solve the issues raised above, by taking into account various risks and by proposing long-term, time-coherent policies. Dynamic utilities allow us to define adaptive strategies adjusted to the information flow, in non-stationary and uncertain environment. Therefore, the dynamic preferences framework provides a general and flexible framework in order to evaluate the impacts of short and long-term changes and to combine various risk parameters. Members of the team have worked since several years on this notion of dynamic utilities and they are now recognized as experts on this field.
In a complex and random environment, decision rules cannot be based on too simple criteria, and some economic approximations lead to optimal choices that are based on linear, or at best quadratic, cost-benefit analysis over time, and which can result in an underestimation of extreme risks. A general stochastic formulation and numerical estimation is useful to question the robustness of the theory.
The aim of this research project consists in proposing efficient numerical methods based on this theoretical framework. The main objectives are optimal detection of tendency changes in the environment, and optimization of economic actors’ decisions using dynamic preference criteria.
1) First, we aim at simulating dynamic utilities, which leads to various numerical challenges. They are related to non-linear forward second order HJB-Stochastic Partial Differential Equations, for which the standard numerical schemes are complex and unstable. We propose different methods for simulating these SPDEs, based on the stochastic characteristic method and neural networks.
2) We detect the transition point and study extreme scenarios in a context of multivariate risks. One possibility to overcome the short-term view of insurance and financial regulations is to consider hitting probabilities over a long-term or infinite horizon. In a multivariate setting, one quickly faces problems of estimation of the dependence structure between risks, as well as heavy computation times. It is non-trivial to detect changes in the risk processes as quickly as possible in presence of multiple sensors. We develop computing algorithms for hitting probabilities and other risk measures in a multivariate setting and in presence of changes in the parameters. We also obtain optimal risk mitigation techniques, using numerical methods.
3) We aim at calibrating dynamic utilities, that should be adapted to the evolving environment characterized by a multivariate source of risks. It consists in learning the decision maker’s preferences, and predict her behavior, based on an observed sequence of decisions. In the meantime, one also need to implement advanced statistical tools to calibrate the multivariate stochastic processes governing the environment.
4) We develop robust decision-making tools, for better handling model uncertainty in the worst case, including uncertainties on volatilities and correlations as well as jumps and moral hazard. We aim to study theoretical and numerical aspects for dynamic utilities under model uncertainty. It addresses the issues of moral hazard and ambiguity in model specification as well as in preferences and investment horizon specification.
Author(s)
Nicolas Chopin
Year
2021/2025
Submission summary
EPIVASCAGE (EPIdemiology of VASCular AGEing) is a 4-year PRC project aiming to examine the association of baseline and vascular ageing progression for incident cardiovascular disease (CVD) and mortality in the community. To this end, EPIVASCAGE will conduct a deep and non-invasive phenotyping of the vascular ageing of large (carotid artery) and small-medium sized arteries (radial artery) and will examine radiomics features in these arterial segments. EPIVASCAGE will rely on the Paris Prospective Study III, an ongoing French community-based prospective study following n=10, 157 men and women aged 50-75 years since 2008. A total of 773 CVD events and 473 deaths are expected by the end of EPIVASCAGE in 2025. A budget of 666 k€ is requested to the ANR.
EPIVASCAGE will include 6 work packages (WP). WP1 will be dedicated to the coordination of EPIVASCAGE. In WP2, we will examine the predictive value of already existing and usable structural and functional carotid ageing biomarkers measured at baseline for incident CVD events (n=498 as of June 2020) (manuscript 1). This WP will also be dedicated to the validation of new CVD events, and access to the national health data hub as a complementary source of information is expected to be obtained by month 3. In WP3, we will perform a radiomics analysis on the raw and stored baseline carotid echo-tracking data containing images but also spectral data. Main steps will include data segmentation, image (texture, shape and gray scale) and spectral data extraction using pre-defined matrix and then data reduction (clustering methods). Then we will examine radiomics signatures and their association with incident CVD events (manuscript 2) together with the joint association of structural/functional carotid ageing biomarkers and radiomics signatures for incident CVD (manuscript 3). WP4 will be dedicated to the second PPS3 physical examination (Examination 2, January 2022 to December 2024, 7000 participants awaited, 75% participation rate expected) and data quality assessment. Carotid echo-tracking will be performed as per baseline assessment and an ultrasound of the radial artery will be newly added to assess vascular ageing of medium-small sized arteries. WP5 will be dedicated to carotid ageing progression using carotid ultrasound data measured at baseline and at examination 2. We will then identify actionable determinants of carotid ageing progression of the structural/functional biomarkers (manuscript 4) and of the radiomics features (delta radiomics, manuscript 5). WP6 will be dedicated to the vascular ageing of the small-medium sized radial artery using data collected at examination 2. Structural and functional biomarkers together with radiomics features will be extracted. Actionable determinants of structural/functional biomarkers (manuscript 6) and of the radiomics signatures (manuscript 7) will then be determined.
EPIVASCAGE will be led by JP Empana and his team, who is INSERM Research Director, Team leader (U970, Team 4 Integrative Epidemiology of cardiovascular diseases) and PI of the Paris Prospective Study III. EPIVASCAGE involves a multidisciplinary team of experts in CVD epidemiology (Partner 1, P1, Empana’s team), arterial wall mechanics (P2, P Boutouyrie, RM Bruno and F Poli, INSERM U970, team 7), high dimensional statistics (P3, N Chopin and Y Youssfi, Centre for Research in Economics and Statistics, CREST) and ultrasound imaging signal processing (P4, E Bianchini and F Faita, Institute of Clinical Physiology from the university of Pisa, Italy). A strong and established collaborative relationships already exists between team members.
The findings from EPIVASCAGE may support a new paradigm shift in the primary prevention of CVD by suggesting that large and small-medium sized arteries may be new and complementary targets for the primary prevention of CVD.
Author(s)
Jean-Michel Zakoian
Year
2021/2025
Submission summary
The growing use of artificial intelligence and Machine Learning (ML) by banks and Fintech companies is one of the most significant technological changes in the financial industry over past decades. These new technologies hold great promise for the future of financial services, but also raise new challenges. In this context, the MLEforRisk project aims to provide better understanding of the usefulness of combining econometrics and ML for financial risk measurement. This project aims to provide a rigorous study of the benefits and limitations of these two approaches in the field of risk management, which is the core business of the financial industry. MLEforRisk is a multidisciplinary project in the fields of finance and financial econometrics which brings together junior and senior researchers in management, economics, applied mathematics, and data science.
The project has five methodological objectives related to credit, market, and liquidity risks. In the context of credit risk, ML methods are known to provide good classification performances. However, these methods often black boxes, which is particularly problematic for both clients and regulators. Thus, our objective is to develop hybrid approaches to credit risk modeling by combining econometrics and ML to overcome the trade-off between interpretability and predictive performance. At the same time, the use of ML in the field of credit risk has led to a debate on the potential discrimination biases which could be generated by these algorithms. Here, our objective is to develop statistical methods to test the algorithmic fairness of credit risk models and to mitigate these biases.
In the area of market risk, the project aims to combine ML techniques and advanced econometric modeling to improve the estimation of conditional risk measures associated to portfolio returns. Our objective is to propose new hybrid approaches for modeling the conditional variance matrix of returns or its inverse, called the precision matrix. Since these risk measures are the key input of trading strategies, the accuracy of their estimation is essential for the asset management industry. These estimation methods will be designed in the perspective of large portfolios for which the number of assets can exceed by far the number of time observations available to estimate the moments. A second objective is to take into account the asymmetry of the conditional distribution of returns when modeling the conditional risk by using ML methods.
Concerning liquidity risk, we observe that the development of alternative market indices and factorial investment significantly modify the dynamics of traded volumes on the markets by increasing dependencies and network effects. Our objective is to take these effects into account when measuring liquidity risk, while reducing the dimension of the parameter set used in the network with ML methods.
The MLEforRisk project aims at creating a doctoral training network for young researchers specialized in financial econometrics. It also aims to promote a reproducible research. All codes and data produced within the project will be archived on RunMyCode and the reproducibility of the numerical results will be certified by cascad, the first certification agency for scientific code and data.
Author(s)
Edouard Challe
Year
2021/2025
Submission summary
Labor income risks, namely unemployent risk and wage risk, are a major concern for many workers, essentially because they are imperfectly insured (that is, insurance markets against idiosyncratic labor-income shocks are “incomplete”). As a result, those risks generate significant ex post inequalities across agents as well as an inefficient precautionary motive for saving, whose instability over the business cycle may greatly amplify economic crises. This source of inequality and aggregate instability is a recurrent phenomenon, and one that is dramatically illustrated by the ongoing worldwide economic collapse. The purpose of the project is to (i) quantify how aggregate shocks are amplified under incomplete markets, (ii) clarify the transmission channels of alternative economic policies in these circumstances; and (iii) design macroeconomic policies (monetary policy, fiscal policy, labor-market policies etc.) capable of optimally stabilizing economic crises in the presence of uninsured labor-income risk.
The project will be composed of two main parts: one that will focus on understanding the transmission mechanisms of aggregate shocks and policies under incomplete markets; and another part that will analyze the optimality of macroeconomic policies (i.e., monetary, fiscal, tax, labor-market policies) in response to aggregate shocks. The focus will be on the way different types of aggregate shocks alter the amount of idiosyncratic risk and rising inequality faced by the households. Given these propagation mechanisms, we will investigate the transmission and the optimality of alternative macro and insurance policies following sharp and brutal declines in economic activity, such as those triggered worldwide by the 2008 financial crisis or the current Covid-19 crisis. Both aspects of the study –the positive one and the normative one–, which will require the development of new models and methods, will be divided into several subprojects involving members of the research team and possibly outside co-authors.
To sum up, the purpose of the overall project is to revisit the transmission channel and optimality of a variety of policy instruments, under the assumption that individual risks are uninsured and households are heterogeneous. These policy tools include:
• conventional monetary policy (i.e., changes in nominal interest rates by the central bank);
• unconventional monetary policy (i.e., forward guidance about future policy rates; large-scale asset purchases; money-financed fiscal stimulus; etc.);
• transitory expansions in government spending or reductions in taxes;
• public debt policies (i.e., optimal public debt in the presence of liquidity demand);
• changes in the level, cyclicality and duration of unemployment benefit payments and short-time work arrangements;
• changes in the level, cyclicality and persistence of tariffs on traded goods.
This is a thriving area of macroeconomics in which several teams are currently competing worldwide. We aim at being one of these teams and would like to rely on the support of ANR to achieve this. We stress that we will pay special attention to the euro area, which is currently facing a number of macroeconomic policy challenges. Indeed, in the euro area monetary policy is centralized but constrained (by the zero lower bound on nominal interest rates), why fiscal policy is decentralized and, overall, non-cooperative. Unemployment insurance is also decentralized, hence with no cross-country risk sharing. Our project will thus help better understand how monetary and fiscal policies should be designed in a context where the institutional features of the euro area may aggravate the lack of insurance across households.
Author(s)
Ivaylo Petev
Year
2020/2024
Submission summary
The motherhood penalty on wages and employment is a major source of gender inequality in the labour market, whose reduction is a stated aim of the European Parliament and the Council on the implementation of the principle of equal opportunities and equal treatment for men and women. We propose studying its causes, using large administrative data for France and Germany that allow us to link employers to employees and look at micro mechanisms in a comparative setup. Specifically, we aim to jointly study the role of firms, human capital depreciation and gender norms in shaping the labour market effects of children in different institutional and policy contexts.
Our research project has two main objectives: (1) Synchronizing and harmonizing of high quality administrative data that exist in relatively similar forms in Germany and France and preparing replication tools for the scientific community. (2) Using the resulting database to compare both countries with regard to family-related employment interruptions and subsequent maternal career and income developments.
As there are almost no registry datasets prepared for comparative cross-national research, the resulting data will be of high value to the research community. Comparatively analysing the drivers of the motherhood wage penalty in France and Germany illustrates the potential of this data and meaningfully contributes to the literature on gender inequality in the labour market.
Registry data allows us to be the first to look at how mothers sort into firms in different countries and to thus directly compare if labour market specific mechanisms through which childbirth affects economic gender inequalities differ according to national context.
France and Germany represent a compelling case study, as both countries followed different paths in how fast they integrated women into the labour force, in implementing family policies and in supporting of dual earner couples. France provides extensive all day childcare services enabling women a fast re-entry into the labour market and has a considerably lower motherhood wage and employment penalty than Germany.
To understand the mechanisms creating the motherhood penalty we use linked employeremployee to estimate exact employment and wage penalties for children on a year by year basis after childbirth. We aim to look at the extent to which wage and employment reductions are the result of mothers sorting into more low-wage, part-time oriented and gendersegregated firms. We expect that firm effects and maternity induced gender-workplace segregation, matter more in producing a high motherhood penalty in Germany, where a long detachment from the labour market and part-time work are more common. Finally, we aim to look at how local differences in gender norms affect the wage penalty, expecting to find a greater influence on careers in Germany, which has an institutional setting in which returning to full time work after birth is less of a societal norm.
Author(s)
Pierre Boyer
Year
2020/2024
Submission summary
Tax reforms: Finding the balance between efficiency and political feasibility
Questions linked to the design and implementation of redistributive tax policies have occupied a growing position on the public agenda over recent years. Moreover, the fiscal pressures brought upon by the current coronavirus crisis will ensure that these issues maintain considerable political significance for years to come.
New design of redistributive tax policies
The design of redistributive tax policies is an evergreen in the public discourse. Research on these questions has led to a well-developed “theory of optimal taxation” with seminal contributions by Mirrlees (1971), Piketty (1997), Diamond (1998) and Saez (2001; 2002). These contributions have in common that they characterize an optimal tax system that takes account of the behavioral responses of taxpayers and the public sector’s budget constraint. This theory is institution-free, which is both a strength and a weakness. It is a strength as it delivers clarity on how incentive effects shape welfare-maximizing taxes. It is a weakness because incentive effects are not the only forces that are relevant for the design of tax policies.
The research in this proposal develops a conceptual framework to analyze “Tax reforms and revolts in democracies”. It delivers a theory that takes account of an essential constraint that emerges in a democracy: tax policies have to find sufficient political support (i.e. being politically feasible) and this has implications for the design of tax systems (see, e.g., Martin and Gabay, 2018; Passarelli and Tabellini, 2017). The gilets jaunes manifestations are a reminder that important mobilizations can led to cancellation of announced tax reforms (see, Boyer et al. (2020a) for a descriptive and geographical analysis of the determinants of the movement). The current coronavirus crisis will put an unprecedented pressure on public finances. Raising revenues will be a priority once the virus recedes and political feasibility and fairness issues will be crucial. Indeed, tax systems have been redesigned after major events such as World Wars and our ability to take these constraints into account will be severely tested (see Scheve and Stasavage, 2016). Revolts could occur if tax reforms are not perceived to be satisfying fairness and political constraints (on fiscal revolts after the World War I see, e.g., Delalande, 2009).
The approach in “Tax reforms and revolts in democracies” will open new directions to scholars in social sciences, both theoretically and empirically. It allows to identify reforms that are appealing from a social welfare perspective and, moreover, are politically feasible.
Author(s)
Victor-Emmanuel Brunel
Year
2019/2025
Submission summary
ADDS: Algorithms for Multi-Dimensional Data via Sketches
Massive data presents particularly vexing challenges for algorithmic processing: not only are most of the commonly encountered problems NP-hard, one cannot afford to spend too much running time or space. Even worse, for approximation algorithms, the dependency on the dimension is often exponential. Overcoming these challenges require the development of a new generation of algorithms and techniques.
Effectively addressing challenges of processing massive data through the notion of sketches: computing a constant-sized subset of the input data that captures key aspects of the entire data.
One key in effectively addressing these challenges is through the notion of sketches: extract a small subset—ideally constant-sized subset—of the input data that captures, approximately with respect to a given parameter epsilon, key aspects of the entire data. Given a family of optimization problems, the goal is to construct sketches whose size is independent of the size of the input data, while minimizing dependence on the dimension and the approximation parameter epsilon.
Sketches are related to more general succinct approximations of data, such as epsilon-nets and epsilon-approximations. An example of sketches are coresets, which are sketches such that solving the given problem on a coreset gives an approximate solution to the problem on the entire data.
While great progress has been achieved for non-geometric problems,
for many fundamental problems on geometric data, the construction and existence of near-optimal sketches remain open. Our research is divided into three parts, requiring expertise in statistics, computational geometry, learning, combinatorics, and algorithms. First, we consider the combinatorial properties of geometric data that are relevant to build compact sketches. Second, we consider the time and space complexities of constructing accurate sketches of data in high dimensions, based on the combinatorial and geometric understanding. Finally, we show how to use the small sketches in order to improve the accuracy and running time of optimization algorithms.
Author(s)
Pierre Boyer
Year
2020/2024
Submission summary
Middleclasses, taxation and democracy in a globalized world
This research project mobilizes the instruments of political economy and optimal tax theory to shed light on the link between inequalities, migration and democracy.
Redistribution, inequalities and populism
Part 1 explores the methods of redistribution of wealth, with a particular focus on the tax burden weighing on the middle classes. Part 2 aims to estimate the impact of the evolution, both real and as relayed by the media, of the socio-fiscal system on the vote of the extreme right. This analysis will allow us to shed light on the recent rise of the extreme right among the middle classes in the Western world.