Agent's Actions
We support seven actions as shown below. The action blocking is designed to be dynamic and would always trying to block the ego.
In this work, we introduce a text-to-traffic scene framework that generates diverse traffic scenes within the Carla simulator based on natural language descriptions with a large language model. Existing methods for text-to-scene generation often depend on generating critical scenarios along a few fixed paths, thus greatly losing the diversity of the environment and limiting the flexibility of customization. In contrast, our approach utilizes a common structured output format to enable the flexible generation of a wide range of traffic scenarios. Users can specify various parameters such as weather conditions, vehicle types, and road signals as one of the generation conditions. Importantly, our model does not require a predetermined location or a trajectory. It can autonomously select the starting point and details of the scenario based on the user's input to generate scenes from scratch. Additionally, our framework is capable of generating not only critical scenarios but also everyday traffic scenes, enhancing its utility. We demonstrate that our framework can provide diverse agent planning and road selection and can facilitate the training of autonomous agents in critical traffic by achieving comparable or superior performance.
We support seven actions as shown below. The action blocking is designed to be dynamic and would always trying to block the ego.
We support seven actions as shown below. The action blocking is designed to be dynamic and would always trying to block the ego.
We support nine different types of agents as shown below.
Agent Type | ||||
---|---|---|---|---|
car | bus | truck | ||
firetruck | ambulance | police car | ||
motorcycle | cyclist | pedestrian |
We support ten different objects and three different signals as shown below.
signals | traffic_light | stop |
---|---|---|
yield | ||
objects | speed_30 | speed_40 |
speed_60 | speed_90 | |
parallel_open_crosswalk | ladder_crosswalk | |
continental_crosswalk | dashed_single_white | |
solid_single_white | stop_line | |
stop_sign_on_road |
A firetruck from the left road is coming when the ego car is turning right.
Daily traffic with more than ten cars.
Lots of cars, buses, trucks, and motorcycles are seen.
A pedestrian on the sidewalk is crossing the street in front of a truck stopping on the shoulder. Both are located on the front right.
A cyclist is crossing the street from a sidewalk in a dangerous way on a rainy night.
Some cars from the opposite straight is coming when the ego car is turning left.
The ego car is going straight at the intersection with a traffic light. There are some puddles on the road.
The ego car is turning left at the intersection with no traffic light, stop sign, or stop sign on road. A car coming from the straight is turning right.
A pedestrian is crossing the road with the parallel open crosswalk and the ego car is turning right.
We provide a diversity test to evaluate the diversity of the generated traffic scenes. We make eight different scenes and test each scene five times. We report the "Agent Diversity" (AD), "Road Diversity" (RD), and "Text Matching" (TM). Prompts for the diversity test can be found in here.
Metric | Scenario | Avg. | |||||||
---|---|---|---|---|---|---|---|---|---|
Normal | Critical | Conditional | |||||||
Daily Traffic |
Intersection |
Pedestrian Crushing |
Blocking Agent |
Dangerous Cut-off |
Only
having Two-wheel Vehicles |
Having Emergency Vehicles |
Rainy Weather |
||
AD↥ | 0.789 | 0.833 | 0.500 | 0.750 | 0.600 | 0.714 | 0.500 | 0.800 | 0.686 |
RD↥ | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.800 | 1.000 | 1.000 | 0.975 |
TM↥ | 1.000 | 0.800 | 0.400 | 0.800 | 0.800 | 0.600 | 1.000 | 1.000 | 0.800 |
We show that our approach can also be used to train the agent under the critical scenario by selected three challenging scenarios from SafeBench. "CR" represents "Collision Rate" and "OS" represents "Overall Score". We compare our results with Learning-to-Collide (LS) [1], AdvSim (AS) [2], Adversarial Trajectory Optimization (AT) [3], and ChatScene (CS) [4]. We color each column with best and second best.
Algo. | Metric | Scenario | Avg. | ||
---|---|---|---|---|---|
Straight Obstacle | Lane Changing | Unprotected Left-turn | |||
LC | CR↧ | 0.120 | 0.510 | 0.000 | 0.210 |
AS | 0.230 | 0.530 | 0.050 | 0.270 | |
AT | 0.140 | 0.300 | 0.000 | 0.150 | |
CS | 0.030 | 0.110 | 0.100 | 0.080 | |
Ours | 0.000 | 0.020 | 0.000 | 0.067 | |
LC | OS↥ | 0.827 | 0.684 | 0.954 | 0.822 | AS | 0.784 | 0.666 | 0.937 | 0.796 | AT | 0.849 | 0.803 | 0.948 | 0.867 |
CS | 0.905 | 0.906 | 0.903 | 0.905 | |
Ours | 0.955 | 0.861 | 0.951 | 0.922 |
We evaluate the efficacy of framework components by generating ten distinct prompts, each tailored to test the components under diverse scenarios. These scenarios encompass three normal, four critical, and three specific road conditions, such as the presence of traffic lights or the absence of crossroads. "AA" represents "Agent Accuracy", "RA" represents "Road Accuracy", and "TM" represents "Text Matching". Prompts for the ablation study can be found in here.
Quality. | Scenario | Avg. | ||||||
---|---|---|---|---|---|---|---|---|
Normal | Critical | Conditional | ||||||
Plan Quality | AA↥ | RA↥ | AA↥ | RA↥ | AA↥ | RA↥ | AA↥ | RA↥ |
w/o. analysis | 0.917 | 0.667 | 0.833 | 0.750 | 0.750 | 0.917 | 0.833 | 0.775 |
w. analysis | 0.917 | 1.000 | 0.875 | 0.750 | 1.000 | 0.917 | 0.925 | 0.875 |
Scene Quality | TM↥ | TM↥ | TM↥ | TM↥ | ||||
w/o. ranking | 0.667 | 0.450 | 0.600 | 0.560 | ||||
w. ranking | 0.867 | 0.750 | 0.800 | 0.800 |
[1] W. Ding, B. Chen, M. Xu, and D. Zhao, “Learning to collide: An adaptive safety-critical scenarios generating method,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2020, pp. 2243–2250.
[2] J. Wang, A. Pun, J. Tu, S. Manivasagam, A. Sadat, S. Casas, M. Ren, and R. Urtasun, “Advsim: Generating safety-critical scenarios for self-driving vehicles,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9909–9918
[3] J. Zhang, C. Xu, and B. Li, “Chatscene: Knowledge-enabled safety-critical scenario generation for autonomous vehicles,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 15 459-15 469.
[4] Q. Zhang, S. Hu, J. Sun, Q. A. Chen, and Z. M. Mao, “On adversarial robustness of trajectory prediction for autonomous vehicles,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 15 159–15 168.
@article{ruan2024ttsg,
title={Traffic Scene Generation from Natural Language Description for Autonomous Vehicles with Large Language Model},
author={Ruan, Bo-Kai and Tsui, Hao-Tang and Li, Yung-Hui and Shuai, Hong-Han},
journal={arXiv preprint arXiv:2409.09575},
year={2024}
}