Peilin Cai’s Personal Website
About
I am a master student researcher focused on computer vision (CV), large language models (LLMs), and multimodal generation. At USC’s Graphics & Vision Lab (advisor: Prof. Yue Wang), my work centers on 3D reconstruction under sparse observations, controllable generative rendering, and embodied navigation.
On June 8, 2026, I will join TikTok as a Machine Learning Engineer.
Research Vision Perception, generation, and grounded world models
More broadly, my research is driven by a simple goal: to enable models that both perceive and generate the real world in a scalable, physically grounded way. From structured 3D reconstruction during my undergraduate years at Wuhan University to my current work on large language models, world models, and embodied agents at USC, a recurring theme has been bridging raw sensory data with structured representations of scenes, actions, and goals.
I am especially interested in unified models of perception and generation for lifelong scene understanding and robust, context-aware agents.
Selected Projects Security, personalization, and long-horizon simulation
I carried out two research projects of great personal significance at Prof. Yue Zhao's FORTIS Lab: SecDOOD (ICCV 2025 Poster) and PERSONABENCH (NeurIPS 2025 MTI-LLM Spotlight). The former proposed a secure on-device OOD detection framework that requires no gradient backpropagation, while the latter introduced the first benchmark for evaluating the personalization capabilities of LLMs in multi-turn conversational settings.
At GVL, I developed The Earth Simulator, a street-view world model that turns a handful of raw, pose-free images into long-horizon, camera-controllable exploration videos grounded in 3D geometry. By combining a persistent 3D Gaussian spatial memory with a generative video model, we aim to preserve real-world structure while achieving photorealistic, temporally stable rollouts from sparse, in-the-wild driving footage.
Experience Engineering depth, research breadth, and collaboration
I have strong coding skills and a solid background in computer vision and natural language processing, along with extensive experience training, deploying, and running inference with LLMs and VLMs. I am also continuing to strengthen my research skills in robotics at the GVL Lab.
If you are interested in collaborating, please feel free to reach out. My preferred email is peilinca@usc.edu.
Publications
-
In Submission - The Earth Simulator: Street View World Modeling with 3D Gaussian Memory and Camera Control
- in Submission, 2025
-
ICCV 2025 Poster - Secure On-Device Video OOD Detection Without Backpropagation
- in International Conference on Computer Vision, 2025
-
NeurIPS 2025 MTI-LLM Workshop Spotlight (Top 5%) - A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations
- in arxiv preprint, 2025
