HybridWorldSim: A Scalable and Controllable High-fidelity Simulator for Autonomous Driving
Abstract
Realistic and controllable simulation is critical for advancing end-to-end autonomous driving, yet existing approaches often struggle to support novel view synthesis under large viewpoint changes or to ensure geometric consistency. We introduce HybridWorldSim, a hybrid simulation framework that integrates multi-traversal neural reconstruction for static backgrounds with generative modeling for dynamic agents. This unified design addresses key limitations of previous methods, enabling the creation of diverse and high-fidelity driving scenarios with reliable visual and spatial consistency. To facilitate robust benchmarking, we further release a new multi-traversal dataset MIRROR that captures a wide range of routes and environmental conditions across different cities. Extensive experiments demonstrate that HybridWorldSim surpasses previous state-of-the-art methods, providing a practical and scalable solution for high-fidelity simulation and a valuable resource for research and development in autonomous driving.
Overview
We introduce HybridWorldSim, a scalable simulator that couples multi-trajectory neural reconstruction for static backgrounds with generative modeling for dynamic agents. It enables:
Dataset
We present our multi-traversal driving dataset MIRROR, collected using various mass-production vehicles, each equipped with a standardized seven-camera rig providing 360-degree coverage. MIRROR dataset captures realistic driving patterns through naturalistic driving behaviors, demonstrates rich multi-traversal diversity with repeated passes through identical regions, and encompasses diverse environmental conditions including varying weather and illumination.
Pipeline
Our framework consists of two main stages: static scene reconstruction and dynamic scene generation. The static stage uses a hybrid 3D Gaussian representation to reconstruct scenes from multiple trajectories, with a multi-node design and trajectory embeddings to decouple scene components and environmental conditions. Given a source view image and a target view, the dynamic stage combines the reconstructed static scene with diffusion-based vehicle generation to synthesize view-consistent dynamic agents.
Simulation Results
We show the original reference data (left) with our synthesized results from novel perspectives (right).
Static Scene Reconstruction Comparison
We compare our static scene reconstruction module with OmniRe on single-traversal data.
We compare our static scene reconstruction module with MTGS on multi-traversal data.
Dynamic Scene Editing Comparison
We compare with DriveEditor on vehicle translation tasks, where the target vehicle is displaced horizontally by identical offsets across both methods.
BibTeX
@article{li2025hybridworldsim,
title={HybridWorldSim: A Scalable and Controllable High-fidelity Simulator for Autonomous Driving},
author={Qiang Li and Yingwenqi Jiang and Tuoxi Li and Duyu Chen and Xiang Feng and Yucheng Ao and Shangyue Liu and Xingchen Yu and Youcheng Cai and Yumeng Liu and Yuexin Ma and Xin Hu and Li Liu and Yu Zhang and Linkun Xu and Bingtao Gao and Xueyuan Wang and Shuchang Zhou and Xianming Liu and Ligang Liu},
journal={arXiv preprint arXiv:2511.22187},
year={2025}
}