Research

I am currently conducting research on summarization of multimodal presentations, for my PhD at Centrale Supélec / Université Paris-Saclay, under the supervision of Frédéric Dufaux and Camille Guinaudeau.

Tools

During my PhD, I developed multiple tools to process and study the summarization of multimodal presentations by leveraging their structure. I packaged some of those tools in a github repo, including methods for slide extraction from presentation records, speech recognition, and the construction of a structured multimodal representation, a cost-effective multimodal input for VLMs.

Publications

  • Théo Gigant, Camille Guinaudeau, Frédéric Dufaux. Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure. Preprint, 2025, [PDF]
  • Théo Gigant, Camille Guinaudeau, Marc Décombas, Frédéric Dufaux. Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Miami, 2024, [PDF] [blog] [code]
  • Théo Gigant, Frédéric Dufaux, Camille Guinaudeau, Marc Décombas. TIB: A Dataset for Abstractive Summarization of Long Multimodal Videoconference Records. In Proceedings of the 20th International Conference on Content-based Multimedia Indexing, Orléans, 2023, [PDF] [blog]

In 2022 I participated in the preprocessing of several datasets as part of the BigScience Research Workshop, which resulted in two publications:

  • Le Scao, Teven, et al. Bloom: A 176b-parameter open-access multilingual language model. 2022, [PDF]
  • Fries, Jason, et al. Bigbio: A framework for data-centric biomedical natural language processing. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022, [PDF]