- This event has passed.
Natural Language Processing, Julien Boelaert (CERAPS, Université de Lille)
SCHEDULE |
Monday |
13th November 2023 20th November 2023 |
From 13:00 to 16:15 |
Room 2033 |
Thursday |
16th November 2023 23rd November 2023 |
From 13:00 to 16:15 |
Room 2033 |
Aims and objectives
The aim of this course is to provide an introduction to the main contemporary methods for natural language processing, and to illustrate them with recent uses of text as data in social sciences.
Natural language processing has made giant steps during the last decade, as illustrated in 2023 by the resounding popularity of chatGPT. In addition, text corpora have become increasingly available for exploitation by social scientists, be it through digitization of originally paper sources (eg. Parliamentary sessions transcripts, printed newspapers, books, historical sources, …) or audio sources (through automatic transcription), or through the advent of natively digital sources (from social media, online newspapers, …).
The course will start with the standard (aka pre-neural) methods of the late 20th century, based on large document-feature matri-ces. We will then cover more recent developments: word embeddings (for improved NLP, or studies about bias in text corpora), topic modeling with Latent Dirichlet Allocation (unsupervised detection of topics), and Transformer models (current state of the art, BERT- and GPT-like models). Each session will comprise a theoretical lecture, and applied examples on R or python.