  This event has passed.

Natural Language Processing, Julien Boelaert (CERAPS, Université de Lille)

November 13 @ 8:00 am - November 23 @ 5:00 pm | Organizer: Etienne Ollion








13th November 2023

20th November 2023


From 13:00 to 16:15


Room 2033




16th November 2023

23rd November 2023


From 13:00 to 16:15


Room 2033

Aims and objectives

The aim of this course is to provide an introduction to the main contemporary methods for natural language processing, and to illustrate them with recent uses of text as data in social sciences.

Natural language processing has made giant steps during the last decade, as illustrated in 2023 by the resounding popularity of chatGPT. In addition, text corpora have become increasingly available for exploitation by social scientists, be it through digitization of originally paper sources (eg. Parliamentary sessions transcripts, printed newspapers, books, historical sources, …) or audio sources (through automatic transcription), or through the advent of natively digital sources (from social media, online newspapers, …).

The course will start with the standard (aka pre-neural) methods of the late 20th century, based on large document-feature matri-ces. We will then cover more recent developments: word embeddings (for improved NLP, or studies about bias in text corpora), topic modeling with Latent Dirichlet Allocation (unsupervised detection of topics), and Transformer models (current state of the art, BERT- and GPT-like models). Each session will comprise a theoretical lecture, and applied examples on R or python.