Portrait
Mingkun Chang
M.E. Student
Sun Yat-sen University
Shenzhen, China
About Me

I am an incoming master's student at HCP Lab, Sun Yat-sen University, advised by Prof. Guanbin Li. Previously, I received my B.Eng. in Artificial Intelligence from Xidian University, and spent a memorable three months at Westlake University, where I was fortunate to work with Prof. Yandong Wen.

My research interests lie in video generation and world models. I am especially interested in:

  • Generative reasoning : understanding the internal reasoning mechanisms of video models and large language models, and exploring how such abilities may generalize.
  • Physical intelligence : exploring how models can learn physical dynamics, spatial-temporal interactions, and actionable representations from real world data and simulation.

I am still exploring these fields and always happy to learn from different perspectives. Please feel free to reach out if you would like to exchange ideas, discuss research, or explore possible collaborations.

News
Graduated from Xidian University 🎓. Grateful for everyone I met along the way, and looking forward to the journey ahead.
Released RealityBridge, a framework for bridging editable 3D Gaussian Splatting driving simulations and real-world videos.
Passed my undergraduate thesis defense at Xidian University.
Our paper CoF-T2I was accepted by ICML 2026 🎉.
The OpenWorldLib technical report was released. Glad to contribute to this exciting project.
Released CoF-T2I, the first text-to-image model that leverages video chain-of-frame reasoning for test-time scaling.
Selected Publications (view all ) * Equal Contribution Corresponding Author
CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation
CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation

Chengzhuo Tong*, Mingkun Chang*, Shenglong Zhang, Yuran Wang, Cheng Liang, Zhizheng Zhao, Ruichuan An, Bohan Zeng, Yang Shi, Yifan Dai, Ziming Zhao, Guanbin Li, Pengfei Wan, Yuanxing Zhang, Wentao Zhang

ICML 2026

CoF-T2I studies how video models can perform frame-by-frame visual reasoning for text-to-image generation, using intermediate frames as explicit reasoning steps toward the final image.

CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation

Chengzhuo Tong*, Mingkun Chang*, Shenglong Zhang, Yuran Wang, Cheng Liang, Zhizheng Zhao, Ruichuan An, Bohan Zeng, Yang Shi, Yifan Dai, Ziming Zhao, Guanbin Li, Pengfei Wan, Yuanxing Zhang, Wentao Zhang

ICML 2026

CoF-T2I studies how video models can perform frame-by-frame visual reasoning for text-to-image generation, using intermediate frames as explicit reasoning steps toward the final image.

RealityBridge: Bridging Editable 3D Gaussian Splatting Driving Simulations and Real-World Videos
RealityBridge: Bridging Editable 3D Gaussian Splatting Driving Simulations and Real-World Videos

Zhenhua Wu*, Yun Pang*, Mingkun Chang*, Yuwei Ning, Liangzhi Wang, Yi Xiao, Guanbin Li

arXiv preprint

RealityBridge is a structure-preserving and asset-aware Sim-to-Real framework for improving edited 3DGS driving videos, targeting artifact removal, realism, and temporal consistency.

RealityBridge: Bridging Editable 3D Gaussian Splatting Driving Simulations and Real-World Videos

Zhenhua Wu*, Yun Pang*, Mingkun Chang*, Yuwei Ning, Liangzhi Wang, Yi Xiao, Guanbin Li

arXiv preprint

RealityBridge is a structure-preserving and asset-aware Sim-to-Real framework for improving edited 3DGS driving videos, targeting artifact removal, realism, and temporal consistency.

All publications
Selected Projects (view all )
OpenWorldLib
OpenWorldLib

Core Contributor · 2026 · 828 stars

World Models, Multimodal Generation, Unified Inference Framework

A unified open-source codebase for advanced world models, integrating perception-centered world-model methods across visual generation, reasoning, VLA, and simulation-related tasks.

OpenWorldLib

Core Contributor · 2026 · 828 stars

World Models, Multimodal Generation, Unified Inference Framework

A unified open-source codebase for advanced world models, integrating perception-centered world-model methods across visual generation, reasoning, VLA, and simulation-related tasks.

All projects
Background
Education
Sun Yat-sen University 2026 - present
School of Computer Science and Engineering
M.E. Student
Xidian University 2022 - 2026
B.Eng. in Artificial Intelligence
Experience
Intelligent Automotive Solution, Huawei 2025.12 - 2026.9
University-Industry Collaboration Intern
Topic: Video Generation Models, Autonomous Driving Data Synthesis.
Computational Learning and Reasoning Lab, Westlake University 2025.6 - 2025.12
Visiting Student with Prof. Yandong Wen
Topic: LLMs Math Reasoning, Reinforcement Learning.
Honors & Awards
First-Class Academic Scholarship, Sun Yat-sen University 2026
Meritorious Winner, Interdisciplinary Contest In Modeling. 2024
Second Prize (Provincial Level), Mathematics Competition of Chinese College Students 2023