Peilin Cai’s Personal Website

Last updated:

About

I am a master student researcher focused on computer vision (CV), large language models (LLMs), and multimodal generation. At USC’s Graphics & Vision Lab (advisor: Prof. Yue Wang), my work centers on 3D reconstruction under sparse observations, controllable generative rendering, and embodied navigation.

On June 8, 2026, I will join TikTok as a Machine Learning Engineer.

Research Vision Perception, generation, and grounded world models

More broadly, my research is driven by a simple goal: to enable models that both perceive and generate the real world in a scalable, physically grounded way. From structured 3D reconstruction during my undergraduate years at Wuhan University to my current work on large language models, world models, and embodied agents at USC, a recurring theme has been bridging raw sensory data with structured representations of scenes, actions, and goals.

I am especially interested in unified models of perception and generation for lifelong scene understanding and robust, context-aware agents.

Selected Projects Security, personalization, and long-horizon simulation

I carried out two research projects of great personal significance at Prof. Yue Zhao's FORTIS Lab: SecDOOD (ICCV 2025 Poster) and PERSONABENCH (NeurIPS 2025 MTI-LLM Spotlight). The former proposed a secure on-device OOD detection framework that requires no gradient backpropagation, while the latter introduced the first benchmark for evaluating the personalization capabilities of LLMs in multi-turn conversational settings.

At GVL, I developed The Earth Simulator, a street-view world model that turns a handful of raw, pose-free images into long-horizon, camera-controllable exploration videos grounded in 3D geometry. By combining a persistent 3D Gaussian spatial memory with a generative video model, we aim to preserve real-world structure while achieving photorealistic, temporally stable rollouts from sparse, in-the-wild driving footage.

Experience Engineering depth, research breadth, and collaboration

I have strong coding skills and a solid background in computer vision and natural language processing, along with extensive experience training, deploying, and running inference with LLMs and VLMs. I am also continuing to strengthen my research skills in robotics at the GVL Lab.

If you are interested in collaborating, please feel free to reach out. My preferred email is peilinca@usc.edu.


Publications

In Submission In Submission
The Earth Simulator: Street View World Modeling with 3D Gaussian Memory and Camera Control
Peilin Cai, Weiduo Yuan, Sicheng He, Cho-Ying Wu, David Paz, Hengyuan Zhang, Yuliang Guo, Xinyu Huang, Liu Ren, Jiageng Mao, Yue Wang
in Submission, 2025
ICCV 2025 Poster ICCV 2025 Poster
Secure On-Device Video OOD Detection Without Backpropagation
Shawn Li, Peilin Cai, Yuxiao Zhou, Zhiyu Ni, Renjie Liang, You Qin, Yi Nian, Zhengzhong Tu, Xiyang Hu, Yue Zhao
in International Conference on Computer Vision, 2025
NeurIPS 2025 MTI-LLM Workshop Spotlight (Top 5%) NeurIPS 2025 MTI-LLM Workshop Spotlight (Top 5%)
A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations
Li Li, Peilin Cai, Ryan A. Rossi, Franck Dernoncourt, Branislav Kveton, Junda Wu, Tong Yu, Linxin Song, Tiankai Yang, Yuehan Qin, Nesreen K. Ahmed, Samyadeep Basu, Subhojyoti Mukherjee, Ruiyi Zhang, Zhengmian Hu, Bo Ni, Yuxiao Zhou, Zichao Wang, Yue Huang, Yu Wang, Xiangliang Zhang, Philip S. Yu, Xiyang Hu, Yue Zhao
in arxiv preprint, 2025

CV