Reasoning with Fewer Eyes: Efficient Visual Token Withdrawal for Multimodal Reasoning
Published in NeurIPS 2025 Workshop on Efficient Reasoning, 2025
We propose M-step Vision Withdrawal (MVW), a training-free method to accelerate multimodal reasoning by removing visual tokens after the model transitions from perception to abstract reasoning, achieving up to 56% speedup.
Recommended citation: Andrea Ramazzina, Tobias Haab, David Fitzek, Stefano Gasperini, Jonas Uhrig, and Mario Bijelic. (2025). "Reasoning with Fewer Eyes: Efficient Visual Token Withdrawal for Multimodal Reasoning." NeurIPS 2025 Workshop on Efficient Reasoning.
Download Paper
