Research and Projects

I am currently working as research scientist at Nous Research. Previously, I defended my PhD thesis on summarization of multimodal presentations with language models, prepared at Centrale Supélec (Université Paris-Saclay) under the supervision of Frédéric Dufaux and Camille Guinaudeau.

Tools

During my PhD, I developed multiple tools to process and study the summarization of multimodal presentations by leveraging their structure. I packaged some of those tools in a github repo, including methods for slide extraction from presentation records, speech recognition, and the construction of a structured multimodal representation, a cost-effective multimodal input for VLMs.

Publications

Théo Gigant, Camille Guinaudeau, Frédéric Dufaux. Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure. Preprint, 2025, [PDF]
Théo Gigant, Camille Guinaudeau, Marc Décombas, Frédéric Dufaux. Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Miami, 2024, [PDF] [blog] [code]
Théo Gigant, Frédéric Dufaux, Camille Guinaudeau, Marc Décombas. TIB: A Dataset for Abstractive Summarization of Long Multimodal Videoconference Records. In Proceedings of the 20th International Conference on Content-based Multimedia Indexing, Orléans, 2023, [PDF] [blog]

In 2022 I participated in the preprocessing of several datasets as part of the BigScience Research Workshop, which resulted in two publications:

Le Scao, Teven, et al. Bloom: A 176b-parameter open-access multilingual language model. 2022, [PDF]
Fries, Jason, et al. Bigbio: A framework for data-centric biomedical natural language processing. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022, [PDF]

Other Projects

My blog: Where I am sharing some insights on my own research, as well as some reflections and opinions about Natural Language Processing, Computer Vision and Artificial Intelligence.
Whisper Medium Romanian (December 2022): State-of-the-art speech recognition model for Romanian, trained for the Whisper Fine-Tuning Event. First in the leaderboard for Romanian ASR.
Old Book Illustrations Dataset (July 2022): Collection of 4172 public domain illustrations scanned from old books, collected from the Old Book Illustrations website with their agreement.
WikiArt diffusion mini (May 2022): As part of HuggingFace’s HugGAN challenge, I worked on a distilled latent diffusion model for text-conditionned image generation trained on the WikiArt dataset of pieces of visual art.
Romanian Wav2Vec2 (February 2022): Speech recognition model for Romanian, trained for the Robust Speech Recognition Challenge. First in the leaderboard for Romanian ASR.
T5-VAE (July 2021): Project for the Flax/JAX community week, that combines a T5 transformer model with a variational autoencoder to learn smooth latent spaces for texts.