Visual Navigation in Real-World Indoor Environments Using End-to-End Deep Reinforcement Learning

Abstract

Visual navigation is essential for many applications in robotics, from manipulation, through mobile robotics to automated driving. Deep reinforcement learning (DRL) provides an elegant map-free approach integrating image processing, localization, and planning in one module, which can be trained and therefore optimized for a given environment. However, to date, DRL-based visual navigation was validated exclusively in simulation, where the simulator provides information that is not available in the real world, e.g., the robot’s position or image segmentation masks. This precludes the use of the learned policy on a real robot. Therefore, we propose a novel approach that enables a direct deployment of the trained policy on real robots.

We have designed visual auxiliary tasks, a tailored reward scheme, and a new powerful simulator to facilitate domain randomization. The policy is fine-tuned on images collected from real-world environments. We have evaluated the method on a mobile robot in a real office environment. Training took ~30 hours on a single GPU. In 30 navigation experiments, the robot reached a 0.3-meter neighborhood of the goal in more than 86.7 % of cases. This result makes the proposed method directly applicable to tasks like mobile manipulation.

Figure 1: The model architecture is similar to our previous work [1] with the difference of the labels for VN auxiliary tasks being the raw camera images instead of the segmentation masks, which are not readily available in the real-world environment.

Real-world dataset experiment

In an office room, we used the TurtleBot 2 robot to collect a dataset of images taken at grid points with a 0.2⁢ resolution When we collected the dataset, we estimated the robot pose through odometry. We compare our method with the PAAC algorithm and the UNREAL algorithm (see the paper). The models were pre-trained in a simulated environment, however, we compare also with models that did not use any pre-training (labelled as np). We also compare with a random agent which selects random movements, but when it reaches the target, the ground truth information is used to signal the goal.

Table 1: In this table we show the mean success rate, the mean distance from the goal (goal distance), and the mean number of steps.

algorithm	success rate	goal distance (m)	steps on grid
ours	0.936	0.145±0.130	13.489±6.286
PAAC	0.922	0.157±0.209	14.323±10.141
UNREAL	0.863	0.174±0.173	14.593±9.023
np ours	0.883	0.187±0.258	15.880±7.022
np PAAC	0.860	0.243±0.447	13.699±6.065
np UNREAL	0.832	0.224±0.358	15.676±6.578
random	0.205	1.467±1.109	147.956±88.501
shortest patd	–	0.034±0.039	12.595±5.743

Real-world evaluation

Finally, to evaluate the trained network in the real-world environment, we have randomly chosen 30 pairs of initial and target states. The trained robot was placed in an initial pose, and it was given a target image. The robot reached the 0.3 meter radius of the goal in 86.7% of the cases. We show a video of one of the episodes.

Citation

Please use the following citation:

@article{kulhanek2021visual,
  title={Visual navigation in real-world indoor environments using end-to-end deep reinforcement learning},
  author={Kulh{\'a}nek, Jon{\'a}{\v{s}} and Derner, Erik and Babu{\v{s}}ka, Robert},
  journal={IEEE Robotics and Automation Letters},
  volume={6},
  number={3},
  pages={4345--4352},
  year={2021},
  publisher={IEEE}
}

Abstract

Real-world dataset experiment

Real-world evaluation

References

Citation