Loading Events
  • This event has passed.

Lucas GIRARD (CREST) – “Measuring Speech Polarization: Identification and Estimation"

December 17, 2019 @ 12:15 pm - 1:15 pm
The Microeconometrics Seminar: Every Tuesday at 12:15 pm.
Time: 11:00 am – 12:00 pm
Date: 17th of December 2019
Place: Room 3001.
Lucas GIRARD (CREST) – “Measuring Speech Polarization: Identification and Estimation”, joint work with Xavier D’Haultfoeuille (CREST) and Roland Rathelot (University of Warwick and CEPR)


Recently, political divisiveness appears to have been increasing in various democracies (Trump, populism, extreme right-wing parties in Europe, etc.). Language, as a basic determinant of group identity, might be part of that story. “Witch hunt” versus “impeachment hearing”; “undocumented workers” compared to “illegal aliens”; “death tax” or “progressive wealth tax”: those partisan expressions name the same object but with different connotations. They diffuse into media coverage and can induce framing effects on public opinion. Hence the interest in measuring speech polarization of political leaders and comparing its evolution over time or across countries.

A way to do so would be linguistics and literary exegesis. Another is a statistical analysis which, despite being quite rough (data is word counts essentially), enables the comprehensive study of a large corpus of texts without relying on ex-ante partisan expressions, and provides a measure to quantify to which extent distinct groups (e.g. Democrats and Republicans in the US Congress between 1874 and 2016) speak using different words.
Gentzkow, Shapiro, and Taddy (Econometrica 2019) address this issue with a (huge) discrete choice model approach and a machine-learning type penalization. We provide an alternative method whose pros are the following: (i) a formal partial identification result for the parameter of interest (speech partisanship index) within a testable statistical model; (ii) simple and computationally light estimators for the bounds and confidence intervals; (iii) only “aggregated data” is required.
As a consequence, our methodology can easily be applied to other settings with the same problem of quantifying differences as regards the choices made by individuals split into two groups in a “high-dimensional” context – meaning that the number of distinct options is large relative to the number of observed choices in data. In our application, these are choices of words pronounced by Republican and Democrat speakers, but it might as well be choices of residential locations between natives and immigrants when investigating segregation, product choices between distinct groups of consumers in empirical industrial organization, etc.

Xavier D’Haultfoeuille (CREST), Benoit Schmutz (CREST), Thomas Delemotte (CREST) & Léa Bou Sleiman (CREST)
Lunch registration:
food provided, no registration