Time: 11:00 am – 12:00 pm
Date: 1st of April 2021
Etienne OLLION – “The Augmented Social Scientist. How to Analyze Millions of Texts Qualitatively”
The last decade witnessed a spectacular rise in the number of digital textual data – whether natively digital, or digitized from other sources. With this new abundance also came the question of how to analyze it. In addition to established methods, more recently developed techniques started to be applied, non supervised machine learning in the first place. But in contrast to other domains in which it has proven highly successful, supervised machine learning (the automatic annotation of text after a brief human training) has not been fully adopted yet. This is, we argue, regrettable since the approach is full of promises for the humanities and the social scientists, as it allows the researcher to craft her own indicators, and to outsource the bulk of the annotation to an algorithm, whose performances are later assessed. The paper review the current limitations for such an avoidance, before it presents an easy strategy to perform supervised machine learning on textual analysis. We demonstrate this by carrying out an experiment on a classic question in the sociology of journalism, the rise of strategic news coverage. The results show that in a few hours, we can effectively train an efficient classifier that can replicate on millions of articles the qualitative annotation of a human researcher.
Nicolas Robette (Pôle de Sociologie du CREST)