Towards Safer Navigation: Reward Shaping with Prior Topographic Knowledge

Project Overview

This project addresses a critical challenge in deep reinforcement learning navigation: improving agent safety while maintaining navigation capabilities. Through innovative integration of prior map information into reward shaping, we successfully enhanced the safety distance between agents and obstacles.

Download Project Slides (PDF) Download Project Report (PDF)

Research Motivation

The navigation field is dominated by two main approaches:

Traditional Navigation Stack
- Widely deployed in real-world applications
- Modular pipeline (mapping, localization, planning, control)
- Requires extensive parameter tuning
- Limited generalization capability
Reinforcement Learning Approach
- Primarily validated in virtual environments
- No explicit map information utilization
- Often generates unsafe navigation paths
- High collision risk with obstacles

Technical Approach

Network Architecture

Input Layer
- RGB images (256×256×3)
- Depth maps (256×256×1)
- GPS and compass data (2D vector)
Feature Extraction
- RGB & Depth: Independent ResNet18 backbones
- GPS/Compass: Two-layer MLP (2→32→512)
Feature Fusion
- Dual-layer LSTM (hidden_size=512)
- Temporal information integration
Output Layer
- Actor Head: 4 discrete actions
  - Turn left/right 30°
  - Move forward 0.5m
  - Stop
- Critic Head: State value estimation

Reward Shaping

Exploration Reward
- Encourages exploration of unknown areas
- Based on goal distance changes
- Prevents overly cautious behavior
Safety Reward
- Based on obstacle proximity
- Negative reward for <0.5m distance
- Positive reward for 0.5-2m range

Experimental Setup

Environment: Habitat Simulator
Datasets: MP3D and Gibson
Training Configuration
- 4 parallel environments
- 5 million training steps
- ~10 hours training duration
- RTX 3060 (12GB VRAM)

Evaluation Metrics

Success Rate
- Goal reached within 500 steps
- Target threshold: 0.2m
SPL (Success weighted by Path Length)
- Combined measure of success rate and path efficiency
Path Safety
- Custom metric
- Average distance to nearest obstacles

Results

Safety Improvements
- RGBD Agent achieved 5.75cm increase in average safety distance
- 1.64% improvement in overall path safety
Sensor Impact Analysis
- Depth sensing crucial for safe navigation
- RGB-only input insufficient for effective safety distance mapping

Future Work

Extend training to 75 million steps
Optimize reward function design
Explore additional sensor combinations
Real robot validation

Project Member

Jiajie Zhang (zhangjj2023@shanghaitech.edu.cn)

Share on

X (formerly Twitter) Facebook LinkedIn

Jiajie Zhang