Observation

MetaDrive provides various kinds of sensory input, as illustrated in the next figure. For low-level sensors, RGB cameras, depth cameras and Lidar can be placed anywhere in the scene with adjustable parameters such as view field and the laser number. Meanwhile, the high-level scene information including the road information and nearby vehicles’ information like velocity and heading can also be provided as the observation.
Note that MetaDrive aims at providing an efficient platform to benchmark RL research, therefore we improve the simulation efficiency at the cost of photorealistic rendering effect.
In this page, we describe the optional observation forms in current MetaDrive version and discuss how to implement new forms of observation subject to your own tasks.
Existing Observation
State Vector
MetaDrive provides a state vector containing necessary information to navigation tasks. We use this state vector in almost all existing RL experiments such as the Generalization, MARL and Safe RL experiments.
The state vector consist of three parts:
Ego State: current states such as the steering, heading, velocity and relative distance to boundaries, implemented in the
vehicle_state
function of StateObservation Class. Please find the detailed meaning of each state dimension in the code.Navigation: the navigation information that guides the vehicle toward the destination. Concretely, MetaDrive first computes the route from the spawn point to the destination of the ego vehicle. Then a set of checkpoints are scattered across the whole route with certain intervals. The relative distance and direction to the next checkpoint and the next next checkpoint will be given as the navigation information. This part is implemented in the
_get_info_for_checkpoint
function of Navigation Class.Surrounding: the surrounding information is encoded by a vector containing the Lidar-like cloud points. The data is generated by the Lidar Class. We typically use 240 lasers (single-agent) and 70 lasers (multi-agent) to scan the neighboring area with radius 50 meters.
The above information is concatenated into a state vector by the LidarStateObservation Class and fed to the RL agents.
Top-down Semantic Maps

MetaDrive also supports Top-down semantic maps. We provide a handy example to illustrate the utilization of Top-down observation in top_down_metadrive.py. You can enjoy this demo via:
python -m metadrive.examples.top_down_metadrive
The following is a minimal script to use Top-down observation.
from metadrive import TopDownMetaDrive
env = TopDownMetaDrive()
o = env.reset()
for i in range(1, 100000):
o, r, d, info = env.step([0, 1])
env.render(mode="top_down")
if d:
env.reset()
env.close()
The TopDownMetaDrive
is a wrapper class on MetaDriveEnv
which overrides observation to pygame top-down renderer.
The native observation of this setting is a numpy array with shape [84, 84, 5]
and all entries fall into [0, 1].
The above figure shows the semantic meaning of each channel.
Use First-view Images in Training


MetaDrive supports visuomotor tasks by turning on the rendering during the training. The above figure shows the images captured by RGB camera (left) and depth camera (right). In this section, we discuss how to utilize such observation in a headless machine, such as computing node in cluster or other remote server. Before using such function in your project, please make sure the offscreen rendering is working in your machine. The setup tutorial is at Install MetaDrive with headless rendering.
Now we can setup the vision-based observation in MetaDrive:
Step 1. Set the
config["image_observation"] = True
to tell MetaDrive maintaining a image buffer in memory even no popup window exists.Step 2. Set the
config["vehicle_config"]["image_source"]
to"rgb_camera"
or"depth_camera"
according to your demand.Step 3. The image size (width and height) will be determined by the camera parameters. The default setting is (84, 84) following the image size in Atari. You can customize the size by configuring
config["vehicle_config"]["rgb_camera"]
. For example,config["vehicle_config"]["rgb_camera"] = (200, 88)
means that the image has 200 pixels in width and 88 pixels in height.
There is a demo script using RGB camera as observation:
python -m metadrive.examples.drive_in_single_agent_env --observation rgb_camera
The script should print a message:
The observation is a dict with numpy arrays as values: {'image': (84, 84, 3), 'state': (21,)}
The image rendering consumes memory in the first GPU of your machine (if any). Please be careful when using this.
If you feel the visual data collection is slow, why not try our advanced offscreen render: Install MetaDrive with advanced offscreen rendering. After verifying your installation, set config[“image_on_cuda”] = True to get 10x faster data collection!