Peilin Cai’s Personal Website

Last updated: June 01, 2026

About

I am a master student researcher focused on computer vision (CV), large language models (LLMs), and multimodal generation. At USC’s Graphics & Vision Lab (advisor: Prof. Yue Wang), my work centers on 3D reconstruction under sparse observations, controllable generative rendering, and embodied navigation.

On July 13, 2026, I will join TikTok as a Machine Learning Engineer.

Research Vision Perception, generation, and grounded world models

More broadly, my research is driven by a simple goal: to enable models that both perceive and generate the real world in a scalable, physically grounded way. From structured 3D reconstruction during my undergraduate years at Wuhan University to my current work on large language models, world models, and embodied agents at USC, a recurring theme has been bridging raw sensory data with structured representations of scenes, actions, and goals.

I am especially interested in unified models of perception and generation for lifelong scene understanding and robust, context-aware agents.

Selected Projects Security, personalization, and long-horizon simulation

I carried out two research projects of great personal significance at Prof. Yue Zhao's FORTIS Lab: SecDOOD (ICCV 2025 Poster) and PERSONABENCH (NeurIPS 2025 MTI-LLM Spotlight). The former proposed a secure on-device OOD detection framework that requires no gradient backpropagation, while the latter introduced the first benchmark for evaluating the personalization capabilities of LLMs in multi-turn conversational settings.

At GVL, I developed The Earth Simulator, a street-view world model that turns a handful of raw, pose-free images into long-horizon, camera-controllable exploration videos grounded in 3D geometry. By combining a persistent 3D Gaussian spatial memory with a generative video model, we aim to preserve real-world structure while achieving photorealistic, temporally stable rollouts from sparse, in-the-wild driving footage.

Experience Engineering depth, research breadth, and collaboration

I have strong coding skills and a solid background in computer vision and natural language processing, along with extensive experience training, deploying, and running inference with LLMs and VLMs. I am also continuing to strengthen my research skills in robotics at the GVL Lab.

If you are interested in collaborating, please feel free to reach out. My preferred email is peilinca@usc.edu.

Publications

In Submission: The Earth Simulator: Street View World Modeling with 3D Gaussian Memory and Camera Control; Peilin Cai, Weiduo Yuan, Sicheng He, Cho-Ying Wu, David Paz, Hengyuan Zhang, Yuliang Guo, Xinyu Huang, Liu Ren, Jiageng Mao, Yue Wang; in Submission, 2025

ICCV 2025 Poster: Secure On-Device Video OOD Detection Without Backpropagation; Shawn Li, Peilin Cai, Yuxiao Zhou, Zhiyu Ni, Renjie Liang, You Qin, Yi Nian, Zhengzhong Tu, Xiyang Hu, Yue Zhao; in International Conference on Computer Vision, 2025

NeurIPS 2025 MTI-LLM Workshop Spotlight (Top 5%): A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations; Li Li, Peilin Cai, Ryan A. Rossi, Franck Dernoncourt, Branislav Kveton, Junda Wu, Tong Yu, Linxin Song, Tiankai Yang, Yuehan Qin, Nesreen K. Ahmed, Samyadeep Basu, Subhojyoti Mukherjee, Ruiyi Zhang, Zhengmian Hu, Bo Ni, Yuxiao Zhou, Zichao Wang, Yue Huang, Yu Wang, Xiangliang Zhang, Philip S. Yu, Xiyang Hu, Yue Zhao; in arxiv preprint, 2025

CV

Curriculum Vitae English CV Resume 中文简历