
Séminaire Image : « Toward Frugal Multimodal Models: Leveraging Prior Knowledge for Efficient Learning », Bilal Faye
3 juillet / 14:00 - 15:00
Nous aurons le plaisir d’écouter Bilal Faye, doctorant au LIPN : Laboratoire d’Informatique de Paris Nord – Université Sorbonne Paris Nord.
Il donnera un séminaire IMAGE le jeudi 3 juillet 2025 à 14h en salle de séminaire F-200.
Titre : « Toward Frugal Multimodal Models: Leveraging Prior Knowledge for Efficient Learning »
Résumé :
Incorporating prior knowledge can significantly reduce the need for large-scale training data and heavy parameterization, while preserving high performance. Within a multimodal framework, this principle enables the design of lightweight and efficient models. Two key examples illustrate this idea: OneEncoder, a frugal multimodal encoder based on contrastive learning that aligns diverse modalities (text, image, audio, video) without relying on massive paired datasets; and LightMDETR, a streamlined adaptation of the MDETR model for open-vocabulary object detection, which enables efficient generalization to unseen categories with reduced computational demands.