SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models
Quentin Guimard*, Federico Bartsch*, Simone Caldarella, Rahaf Aljundi, Elisa Ricci, Massimiliano Mancini
Currently in Denver: Presenting our work on post-hoc feature modulation (SEM) at the CVPR 2026 Findings track with my colleagues from Trento!
I am a postdoctoral research fellow at the University of Trento (DISI), where my research focuses on Trustworthy AI. I work within the Multimedia and Human Understanding Group (MHUG) in collaboration with Prof. Elisa Ricci.
During my PhD, supervised by Prof. Lucile Sassatelli, I explored the links between visual content, human attention, and emotions in immersive 360° environments, designing adaptive systems (such as DVMS) to ensure equitable streaming quality across diverse users.
This focus on fairness and reliability led to my current work on the robustness and capabilities of multi-modal foundation models. For example, my ongoing projects involve building frameworks to track and semantically map complex physical object transformations in video, and developing benchmarks to audit the perceptual and reasoning limits of video LLMs in safety-critical scenarios.
Open Science & Reproducibility I strongly believe that trustworthy AI requires transparent science. I prioritize reproducibility and open-source engineering, an ethos reflected in the public release of frameworks like C2B and SEM, as well as the ACM reproducibility badges awarded to my doctoral research.
Quentin Guimard*, Federico Bartsch*, Simone Caldarella, Rahaf Aljundi, Elisa Ricci, Massimiliano Mancini
Quentin Guimard, Moreno D'Incà, Massimiliano Mancini, Elisa Ricci
Quentin Guimard*, Florent Robert*, Camille Bauce, Aldric Ducreux, Lucile Sassatelli, Hui-Yin Wu, Marco Winckler, Auriane Gros
Quentin Guimard, Lucile Sassatelli, Francesco Marchetti, Federico Becattini, Lorenzo Seidenari, Alberto Del Bimbo


A post-hoc, zero-shot debiasing framework that operates in a Sparse Autoencoder (SAE) latent space to pinpoint and modulate bias-relevant neurons in Vision-Language Models.