HorizonDrive

Self-Corrective Autoregressive World Model for Long-horizon Driving Simulation

Authors & Affiliations

Conglang Zhang1,*Yifan Zhan2,*Qingjie Wang3Zhanpeng Ouyang3Yu Li4
Zihao Yang5Xiaoyang Guo6Weiqiang Ren3Qian Zhang3Zhen Dong1
Yinqiang Zheng2Wei Yin3,‡Zhengqing Chen3,†

1 Wuhan University   2 The University of Tokyo   3 Horizon Robotics   4 Tsinghua University
5 University of Science and Technology of China   6 The Chinese University of Hong Kong

* Equal contribution    Project lead    Corresponding author

Abstract

Closed-loop driving simulation requires real-time interaction beyond short offline clips, pushing current driving world models toward autoregressive (AR) rollout. Existing AR distillation approaches typically rely on frame sinks or student-side degradation training. The former transfers poorly to driving due to fast ego-motion and rapid scene changes, while the latter remains bounded by the teacher's single-pass output length and thus provides only a limited supervision horizon. A natural question is: can the teacher itself be extended via AR rollout to provide unbounded-horizon supervision at bounded memory cost? The key difficulty is that a standard teacher drifts under its own predictions, contaminating the supervision it provides. Our key insight is to make the teacher rollout-capable, ensuring reliable supervision from its own AR rollouts. This is instantiated as HorizonDrive, an anti-drifting training-and-distillation framework for AR driving simulation. First, scheduled rollout recovery (SRR) trains the base model to reconstruct ground-truth future clips from prediction-corrupted histories, yielding a teacher that remains stable across long AR rollouts. Second, the rollout-capable teacher is extended via AR rollout, providing long-horizon distribution-matching supervision under bounded memory, while a short-window student aligns to it with teacher rollout DMD (TRD) for efficient real-time deployment. HorizonDrive natively supports minute-scale AR rollout under bounded memory; on nuScenes, HorizonDrive reduces FID by 52% and FVD by 37%, and lowers ARE and DTW by 21% and 9% relative to the strongest long-horizon streaming baselines, while remaining competitive with single-pass driving video generators.

Key Features

Controllable Driving scene generation
Long-Horizon Stable generation quality
Interactive AR rollout
Scalable Diverse driving scenes
Closed-Loop Simulation ready
No Explicit 3D representations

Method Overview

Overview of the HorizonDrive method pipeline. A three-stage method diagram showing conditional world model training, scheduled rollout recovery, and teacher rollout DMD. Stage1: Conditional Driving WM (Sec. 4.1) Stage2: Scheduled Rollout Recovery (Sec. 4.2 ) Stage3: Teacher Rollout DMD (Sec. 4.3) HD Map 3D Bboxs Action Conditional Driving WM Short Horizon ... Base Model Data Preparation Rollout ... Base Model Rollout Prediction Train degraded history clean target with noise Rollout-capable Base Model Rollout Teacher Teacher rollout DMD Student Long Horizon CFG truncation threshold 1000 300 no CFG CFG training iterations ... 1st chunk 2nd chunk 3rd chunk 4th chunk

Overview of HorizonDrive framework. We first train a conditional driving world model, then improve its autoregressive stability through scheduled rollout recovery, and finally distill long-horizon teacher rollouts into a few-step, short-chunk student via teacher-rollout DMD.

20-Second AR Results on Nuscenes

30-Second AR Results on Self-Collected Dataset

Minute-Level AR Video Generation

Closed-Loop Driving Simulation

Quantitative Results

nuScenes val

Method FID ↓ FVD ↓ Qual. ↑ Mot. ↑ Img. ↑ ARE ↓ DTW ↓
Long-horizon interactive world model frameworks
Matrix-Game335.69338.2278.9993.7860.44N/AN/A
Helios30.53218.2379.0295.0358.82N/AN/A
Causal-Forcing49.07373.2974.3592.4259.00N/AN/A
HY-WorldPlay33.51580.7276.5899.4858.60N/AN/A
LingBot-World37.67325.5577.0892.8755.55N/AN/A
Long-horizon streaming methods (re-trained on our base model and data)
Self-Forcing41.53161.0079.2794.1759.653.476.22
Self-Forcing++28.84147.5779.4793.9260.253.783.61
LongLive29.05161.4179.3593.4660.803.283.65
HorizonDrive (Ours) 13.82 92.99 79.53 93.85 62.50 2.60 3.27

Self-collected dataset

Method FID ↓ FVD ↓ Qual. ↑ Mot. ↑ Img. ↑ ARE ↓ DTW ↓
Long-horizon streaming methods (re-trained on our base model and e2e data)
Self-Forcing58.23561.1176.6894.4863.185.4314.13
Self-Forcing++66.93534.3674.5492.7059.127.3218.40
LongLive28.39374.9478.1894.5762.534.058.11
HorizonDrive (Ours) 12.01 117.27 80.12 95.22 67.65 3.67 5.29

Citation

If you find our work useful, please cite it as:

@misc{zhang2026horizondriveselfcorrectiveautoregressiveworld,
  title={HorizonDrive: Self-Corrective Autoregressive World Model for Long-horizon Driving Simulation},
  author={Zhang, Conglang and Zhan, Yifan and Wang, Qingjie and Ouyang, Zhanpeng and Li, Yu and Yang, Zihao and Guo, Xiaoyang and Ren, Weiqiang and Zhang, Qian and Dong, Zhen and Zheng, Yinqiang and Yin, Wei and Chen, Zhengqing},
  year={2026},
  eprint={2605.11596},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2605.11596},
}