Visual Navigation in Real-World Indoor Environments Using End-to-End Deep Reinforcement Learning

Jonáš Kulhánek Czech Technical University in Prague
Erik Derner Czech Technical University in Prague
Robert Babuška Delft University of Technology


Visual navigation is essential for many applications in robotics, from manipulation, through mobile robotics to automated driving. Deep reinforcement learning (DRL) provides an elegant map-free approach integrating image processing, localization, and planning in one module, which can be trained and therefore optimized for a given environment. However, to date, DRL-based visual navigation was validated exclusively in simulation, where the simulator provides information that is not available in the real world, e.g., the robot’s position or image segmentation masks. This precludes the use of the learned policy on a real robot. Therefore, we propose a novel approach that enables a direct deployment of the trained policy on real robots.

We have designed visual auxiliary tasks, a tailored reward scheme, and a new powerful simulator to facilitate domain randomization. The policy is fine-tuned on images collected from real-world environments. We have evaluated the method on a mobile robot in a real office environment. Training took ~30 hours on a single GPU. In 30 navigation experiments, the robot reached a 0.3-meter neighborhood of the goal in more than 86.7 % of cases. This result makes the proposed method directly applicable to tasks like mobile manipulation.

Architecture overview
Figure 1: The model architecture is similar to our previous work [1] with the difference of the labels for VN auxiliary tasks being the raw camera images instead of the segmentation masks, which are not readily available in the real-world environment.

Real-world dataset experiment

In an office room, we used the TurtleBot 2 robot to collect a dataset of images taken at grid points with a 0.2⁢ resolution When we collected the dataset, we estimated the robot pose through odometry. We compare our method with the PAAC algorithm and the UNREAL algorithm (see the paper). The models were pre-trained in a simulated environment, however, we compare also with models that did not use any pre-training (labelled as np). We also compare with a random agent which selects random movements, but when it reaches the target, the ground truth information is used to signal the goal.
Table 1: In this table we show the mean success rate, the mean distance from the goal (goal distance), and the mean number of steps.
algorithm success rate goal distance (m) steps on grid
ours 0.936 0.145±0.130 13.489±6.286
PAAC 0.922 0.157±0.209 14.323±10.141
UNREAL 0.863 0.174±0.173 14.593±9.023
np ours 0.883 0.187±0.258 15.880±7.022
np PAAC 0.860 0.243±0.447 13.699±6.065
np UNREAL 0.832 0.224±0.358 15.676±6.578
random 0.205 1.467±1.109 147.956±88.501
shortest patd 0.034±0.039 12.595±5.743

Real-world evaluation

Finally, to evaluate the trained network in the real-world environment, we have randomly chosen 30 pairs of initial and target states. The trained robot was placed in an initial pose, and it was given a target image. The robot reached the 0.3 meter radius of the goal in 86.7% of the cases. We show a video of one of the episodes.


[1] Jonáš Kulhánek, Erik Derner, Tim De Bruin, and Robert Babuška. Vision-based navigation using deep reinforcement learning. In 2019 European Conference on Mobile Robots (ECMR), pages 1-8, 2019.


Please use the following citation:
  title={Visual navigation in real-world indoor environments using end-to-end deep reinforcement learning},
  author={Kulh{\'a}nek, Jon{\'a}{\v{s}} and Derner, Erik and Babu{\v{s}}ka, Robert},
  journal={IEEE Robotics and Automation Letters},