👨🎓 About Me
I am a second-year Ph.D. student at Peking University, advised by Prof. Xuejun Yang and Prof. Wenjing Yang. I earned my B.S. degree at China University of Geosciences in 2023.
My primary research interest focus on Foundation Models for Multimodal Learning. I am also interested in Causal Inference and Reinforcement Learning. My overarching research goal is to build reliable and generalizable multimodal intelligence, with a focus on developing principled methods that integrate vision, language, and structured reasoning under real-world conditions.
Currently I am working on Efficient Pretraining and Fine-tuning of Multimodal Large Language Models and Unified Models (Any-to-Any).
If you are interested in partnering on research projects, offering internship opportunities or exchange programs, I would be thrilled to connect with you. 😄
📝 Publications
* Equal Contribution, † Corresponding Author, ‡ Project Lead, # Core Contributor
- MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios
Yang Shi#, Huanqian Wang#, Wulin Xie#, Huanyao Zhang#, Lijie Zhao#, YiFan Zhang#†, Xinfeng Li, Chaoyou Fu, Zhuoer Wen, Wenting Liu, Zhuoran Zhang, Xinlong Chen, Bohan Zeng, Sihan Yang, Yuanxing Zhang‡, Pengfei Wan, Haotian Wang†, Wenjing Yang† - Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Yang Shi*, Jiaheng Liu*, Yushuo Guan*, Zhenhua Wu, Yuanxing Zhang†, Zihao Wang, Weihong Lin, Jingyun Hua, Zekun Wang, Xinlong Chen, Bohan Zeng, Wentao Zhang, Fuzheng Zhang, Wenjing Yang, Di Zhang - MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
Wulin Xie*, Yi-Fan Zhang*‡, Chaoyou Fu, Yang Shi, Bingyan Nie, Hongkai Chen, Zhang Zhang, Liang Wang, Tieniu Tan - MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Yi-Fan Zhang‡, Tao Yu, Haochen Tian, Chaoyou Fu†, Peiyan Li, Jianshu Zeng, Wulin Xie, Yang Shi, Huanyu Zhang, Junkang Wu, Xue Wang, Yibo Hu, Bin Wen†, Fan Yang, Zhang Zhang†, Tingting Gao, Di Zhang, Liang Wang, Rong Jin, Tieniu Tan - EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents
Zhili Cheng‡, Yuge Tu#, Ran Li#, Shiqi Dai#, Jinyi Hu#‡, Shengding Hu, Jiahao Li, Yang Shi, Tianyu Yu, Weize Chen, Lei Shi, Maosong Sun† - Debiasing Multimodal Large Language Models via Penalization of Language Priors
YiFan Zhang*, Yang Shi*, Weichen Yu, Qingsong Wen†, Xue Wang, Wenjing Yang, Zhang Zhang, Liang Wang, Rong Jin
👨💻 Work Experience
- Research Intern at Kling, Kuaishou Technology, 2025.05 - Present
- Research Intern at KwaiYii, Kuaishou Technology, 2025.02 - 2025.05
- Research Intern at THUNLP, Tsinghua University, 2023.11 - 2025.02
📚 Education
- Ph.D. School of Computer Science, Peking University, 2023 - Present
- B.S. School of Computer Science, China University of Geosciences, 2019 - 2023
🌟 Honors & Awards
- Ruiming Alumni Scholarship, 1‰ , 2021
- China National Scholarship, 0.2% , 2020