Benlin Liu

I am a Ph.D. student in Paul G. Allen School of Computer Science & Engineering, University of Washington, advised by Ranjay Krishna. I am affiliated with the UW Graphics and Imaging Laboratory (GRAIL).

I received my master degree from the Department of Computer Science, UCLA, where I was a research assistant under the supervision of Prof. Cho-jui Hsieh. I also collaborated with Prof. Xiaolong Wang at UCSD. I obtained my BEng. degree from the Department of Electronic Engineering, Tsinghua University, and I worked with Prof. Jiwen Lu of the Department of Automation. During undergraduate, I visited the GRASP Lab at University of Pennsylvania and worked with Prof. Jianbo Shi.

My research aims to build multimodal intelligence that can perceive, reason about, and simulate the dynamic visual world we live in. I am particularly interested in complex video understanding, perception-centric reasoning, and training large multimodal models whose “thinking” is grounded in what they see and remember over time.

Email / CV / Google Scholar / Github / Twitter

News

2025-09: One paper got accepted by NeurIPS 2025!
2025-07: One paper got accepted by COLM 2025!
2025-06: I will be a research scientist intern at Meta this summer.
2025-02: One paper got accepted by CVPR 2025!
2025-01: Two papers got accepted by ICLR 2025! One of them is a spotlight presentation.
2024-07: One paper got accepted by ECCV 2024.
2023-07: Two papers got accepted by ICCV 2023.
2023-05: I'll be a student researcher at Google DeepMind this summer.
2022-06: I am greatly honored to be chosen as UW Reality Lab–Meta Fellow.
2021-09: DynamicViT is accepted to NeurIPS 2021.
2021-07: Two papers are accepted to ICCV 2021.
2020-12: One paper on image classification is accepted to AAAI 2021.
2020-07: One paper on knowledge distillation is accepted by ECCV 2020.

Show all

Publications

* indicates equal contribution

	Prioritizing Perception Improves Complex Video Reasoning Benlin Liu, Arka Sadhu, Hyo Jin Kim, Kejie Li, Yifan Wang, Yuning Chai, Ranjay Krishna, Yuliang Li Under review
	PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning Shaoxuan Li, Zhixuan Zhao, Hanze Deng, Zirun Ma, Shulin Tian, Zuyan Liu, Yushi Hu, Haoning Wu, Yuhao Dong, Benlin Liu*, Ziwei Liu†, Ranjay Krishna† Under review* * Project co-lead † Equal advising [Paper]
	Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation Yang Fei, George Stoica, Jingyuan Liu, Qifeng Chen, Ranjay Krishna, Xiaojuan Wang, Benlin Liu Under review [Paper]
	CapNav: Benchmarking Vision Language Models on Capability-conditioned Indoor Navigation Xia Su, Ruiqi Chen, Benlin Liu, Jingwei Ma, Zonglin Di, Ranjay Krishna, Jon E. Froehlich Under review [Paper]
	Seeking and Updating with Live Visual Knowledge Mingyang Fu, Yuyang Peng, Dongping Chen, Zetong Zhou, Benlin Liu, Yao Wan, Zhou Zhao, Philip S. Yu, Ranjay Krishna Advances in Neural Information Processing Systems (NeurIPS), 2025 [Paper]
	Visual Representations inside the Language Model Benlin Liu, Amita Kamath, Madeleine Grunde-McLaughlin, Winson Han, Ranjay Krishna Conference on Language Modeling (COLM), 2025 [Paper]
	Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model Benlin Liu, Yuhao Dong, Yiqin Wang, Zixian Ma, Yansong Tang, Luming Tang, Yongming Rao, Wei-Chiu Ma, Ranjay Krishna Conference on Computer Vision and Pattern Recognition (CVPR), 2025 [Paper][Project Page]
	Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment (Spotlight) Dongping Chen, Ruoxi Chen, Shu Pu, Zhaoyi Liu, Yanru Wu, Caixi Chen, Benlin Liu, Yue Huang, Yao Wan, Pan Zhou, Ranjay Krishna International Conference on Learning Representations (ICLR), 2025 [Paper][Project Page]
	GMValuator: Similarity-based Data Valuation for Generative Models Jiaxi Yang, Wenlong Deng, Benlin Liu, Yangsibo Huang, James Zou, Xiaoxiao L International Conference on Learning Representations (ICLR), 2025 [Paper]
	Efficient Inference of Vision and Language Instruction-Following Models with Elastic Cache Zuyan Liu, Benlin Liu, Jiahui Wang, Yuhao Dong, Guangyi Chen, Yongming Rao, Ranjay Krishna, Jiwen Lu European Conference on Computer Vision (ECCV), 2024 [Paper]
	TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay Krishna, Noah A. Smith IEEE/CVF International Conference on Computer Vision (ICCV), 2023 [Paper][Project Page][Code]
	Unleashing Text-to-Image Diffusion Models for Visual Perception Wenliang Zhao, Yongming Rao, Zuyan Liu, Benlin Liu, Jie Zhou, Jiwen Lu IEEE/CVF International Conference on Computer Vision (ICCV)*, 2023 [Paper][Project Page][Code]
	DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification Yongming Rao, Wenliang Zhao, Benlin Liu, Jiwen Lu, Jie Zhou, Cho-Jui Hsieh Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS), 2021 [Paper][Project Page][Code][Video]
	RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection Benlin Liu, Yongming Rao, Yi Wei, Jiwen Lu, Cho-Jui Hsieh, Jie Zhou IEEE/CVF International Conference on Computer Vision (ICCV), 2021 [Paper]
	Robust Object Detection via Instance-Level Temporal Cycle Confusion Xin Wang, Benlin Liu, Thomas E. Huang, Fisher Yu, Xiaolong Wang, Joseph E. Gonzalez, Trevor Darrell IEEE/CVF International Conference on Computer Vision (ICCV), 2021 [Paper][Project Page]
	Multi-ProxyWasserstein Classifier for Image Classification Benlin Liu, Yongming Rao, Jiwen Lu, Jie Zhou, Cho-Jui Hsieh 35th AAAI Conference on Artificial Intelligence (AAAI), 2021 [Paper]
	MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation Benlin Liu, Yongming Rao, Jiwen Lu, Jie Zhou, Cho-Jui Hsieh 16th European Conference on Computer Vision (ECCV), 2020 [Paper]

Academic Services

Conference Reviewer: CVPR, ICCV, ECCV, ICLR, NeurIPS, ICML, WACV

Website Template