2025-11-13T13:49:10.872331

Robust Visual Teach-and-Repeat Navigation with Flexible Topo-metric Graph Map Representation

Wang, Cheng, Wang et al.
Visual Teach-and-Repeat Navigation is a direct solution for mobile robot to be deployed in unknown environments. However, robust trajectory repeat navigation still remains challenged due to environmental changing and dynamic objects. In this paper, we propose a novel visual teach-and-repeat navigation system, which consists of a flexible map representation, robust map matching and a map-less local navigation module. During the teaching process, the recorded keyframes are formulated as a topo-metric graph and each node can be further extended to save new observations. Such representation also alleviates the requirement of globally consistent mapping. To enhance the place recognition performance during repeating process, instead of using frame-to-frame matching, we firstly implement keyframe clustering to aggregate similar connected keyframes into local map and perform place recognition based on visual frame-tolocal map matching strategy. To promote the local goal persistent tracking performance, a long-term goal management algorithm is constructed, which can avoid the robot getting lost due to environmental changes or obstacle occlusion. To achieve the goal without map, a local trajectory-control candidate optimization algorithm is proposed. Extensively experiments are conducted on our mobile platform. The results demonstrate that our system is superior to the baselines in terms of robustness and effectiveness.
academic

Robust Visual Teach-and-Repeat Navigation with Flexible Topo-metric Graph Map Representation

Basic Information

  • Paper ID: 2510.09089
  • Title: Robust Visual Teach-and-Repeat Navigation with Flexible Topo-metric Graph Map Representation
  • Authors: Jikai Wang, Yunqi Cheng, Kezhi Wang, and Zonghai Chen (University of Science and Technology of China)
  • Category: cs.RO (Robotics)
  • Publication Date: October 10, 2025
  • Paper Link: https://arxiv.org/abs/2510.09089

Abstract

This paper proposes a novel visual teach-and-repeat (VTR) navigation system that addresses challenges posed by environmental changes and dynamic objects through flexible map representation, robust map matching, and mapless local navigation modules. The system employs a topo-metric graph structure to store keyframes, supporting node expansion to preserve new observations. Position recognition performance is enhanced through keyframe clustering and frame-to-local-map matching strategies, while a long-term goal management algorithm is constructed to prevent the robot from becoming lost due to environmental changes or obstacle occlusion.

Research Background and Motivation

Problem Definition

Visual teach-and-repeat (VTR) navigation serves as a direct solution for deploying mobile robots in unknown environments. However, achieving robust trajectory repetition navigation in the presence of environmental changes and dynamic objects remains challenging.

Significance

  1. Practical Value: VTR navigation avoids complete mapping of task environments, enabling more efficient robot deployment
  2. Application Demand: Widespread demand in fixed-route navigation scenarios (e.g., navigation between factory stations)
  3. Technical Challenge: Maintaining navigation robustness under environmental changes, dynamic objects, and path deviations

Limitations of Existing Methods

  1. Map Representation Issues: Traditional methods rely on globally consistent mapping with high localization accuracy requirements
  2. Fragile Place Recognition: Frame-to-frame matching lacks robustness under viewpoint changes and occlusions
  3. Navigation Module Dependency: Existing systems over-rely on accurate place recognition, failing easily when matching fails
  4. Poor Environmental Adaptability: Difficulty handling environmental changes and dynamic obstacles

Core Contributions

  1. Proposed Flexible Map Representation: Designed a topo-metric graph structure that adapts to environmental changes and odometry drift errors
  2. Constructed Robust VTR Navigation System: Capable of adapting to environmental changes, dynamic objects, and viewpoint occlusions; navigation module can be embedded in other VTR systems
  3. Implemented User-Friendly System: Easy adaptation to new task environments with good practical utility
  4. Verified System Effectiveness: Extensive experiments on mobile platforms demonstrate superiority over baseline methods

Methodology Details

Task Definition

VTR navigation comprises two phases:

  • Teaching Phase: Manual operation of the robot along the task route, real-time recording of visual frames as the map
  • Repeat Phase: Robot attempts to match current visual frames with the map and updates the next target upon successful matching

System Architecture

1. Map Representation Error Analysis

Traditional SLAM map representation:

M̂ = {[Ki, T̂WI], i = 1, ···, N}

where estimated global poses contain cumulative drift errors. The proposed representation:

M̄ = {[Ki, T̂ij], i, j = 1, ···, N}

Each keyframe stores only reliable relative pose transformations with neighboring keyframes.

2. Topo-metric Keyframe Map

Keyframe definition:

Ki = {Ti-1i, Ui, Pi, Ii}

containing relative transformation, 2D feature points, 3D position, and image information. Upon loop closure detection, extended to:

Ki = {Ti-1i, Ui, Pi, Ii, TL(i)i, L(i)}

3. Map Redundancy Reduction

Merging similar frames through keyframe clustering:

  • Compute DBoW similarity; stop below threshold
  • Transform 3D feature points of similar keyframes to retained frame coordinate system
  • Remove redundant keyframes while maintaining linked list structure

Visual Repeat Phase

1. Frame-to-Keyframe Matching

Employing constrained search strategy:

Rn = {[u,v]T | ||[u,v]T - [un,vn]T||2 < γ}

Search for corresponding features within circular regions; solve relative pose via PnP.

2. Map Extension

When the robot deviates from the teaching route, add new observations to the map:

Ki = {Ti-1i, Ūi, P̄i, Ii, TL(i)i, L(i), TiS(i), S(i), {K}}

3. Goal List Management

Construct goal lists rather than single targets:

Tkg0 = inv(Tik) · TiS(i)
Tkg1 = Tkg0 · TS(i)S(S(i))

Goal list Lg = {tg0, tg1, ···, tgM} updates upon successful matching.

4. Local Motion Planning

Implement multi-target tracking through trajectory candidate scoring:

si = (1/3) Σ(m=0 to 2) (1 - (0.005 · Θ(tie - x, tgm - x))^(1/2))

Score considering the first three targets; select optimal trajectory.

Experimental Setup

Mobile Platform Configuration

  • Hardware: Differential drive platform equipped with IMU embedded camera (MYNTEYE-SC) and LiDAR (Livox Mid-360)
  • Localization System: OpenVINS for visual odometry; iG-LIO records trajectories for evaluation

Evaluation Metrics

  • End-point Distance: Distance between actual arrival point and preset teaching route endpoint
  • Success Rate: Whether the robot can navigate from start to endpoint (strict route following not required)

Datasets

  • Environments: Office and corridor scenes
  • Route Types: Straight and curved paths
  • Test Conditions: Normal state, obstacle occlusion, environmental changes

Comparison Methods

  • BVTR: Classical bio-inspired VTR method
  • Ablation Studies: Variants without keyframe clustering, single-target tracking, etc.

Experimental Results

Main Results

1. Navigation Under Normal Conditions

  • Office Scene: Proposed method end-point distance 0.08m, BVTR 0.10m
  • Both methods successfully complete navigation with slight deviations at turns

2. Obstacle Occlusion Testing

  • Proposed Method: End-point distance 0.08m, successfully avoids obstacles and returns to teaching route
  • BVTR: End-point distance 5.58m, stops before obstacle unable to continue
  • Single-Target Version: End-point distance 5.20m, validating importance of multi-target strategy

3. Curved Path Navigation (Corridor Scene)

  • Proposed Method: End-point distance 0.37m, successfully follows entire route
  • BVTR: End-point distance 11.44m, stops after navigating to unknown location
  • Without Keyframe Clustering: End-point distance 10.49m, demonstrating critical role of clustering strategy

4. Keyframe Clustering Verification

Keyframe clustering significantly increases loop closure detection density, particularly at turns, providing timely feedback to the motion planning module.

5. Map Extension Verification

System successfully adds new environmental information during repeat phase; extended keyframes maintain association with original map without disrupting topological structure.

Experimental Findings

  1. Long-term Goal Management: Multi-target strategy significantly improves system robustness to loop closure detection failures
  2. Keyframe Clustering: Critical for robust matching in texture-poor environments
  3. Map Extension: Effectively handles environmental changes, supporting long-term navigation tasks

Main Research Directions

  1. Bio-inspired Methods: Direct image comparison and pattern recognition
  2. Visual Geometric Methods: Feature-based image matching and PnP solving
  3. Deep Learning Methods: End-to-end learning and neural network matching
  4. Topo-metric Fusion: Navigation combining topological and metric information

Advantages of This Work

  • Compared to bio-inspired methods: More robust feature matching
  • Compared to deep learning methods: High computational efficiency, strong interpretability
  • Compared to traditional geometric methods: No global consistency requirement, strong adaptability

Conclusions and Discussion

Main Conclusions

  1. Flexible Map Representation: Topo-metric graphs effectively mitigate global mapping requirements
  2. Robust Navigation System: Multi-target management and keyframe clustering significantly enhance system robustness
  3. Practical Verification: System effectiveness validated across multiple challenging scenarios

Limitations

  1. Relative Pose Dependency: System performance depends on accuracy of relative poses between keyframes
  2. Long-term Drift: Prolonged map matching failures may cause odometry drift divergence
  3. Environmental Assumptions: Assumes sufficiently accurate relative pose estimation, which may not hold in certain environments

Future Directions

Develop end-to-end visual navigation models based on deep learning to further reduce dependence on accurate global pose tracking and environmental mapping.

In-Depth Evaluation

Strengths

  1. Technical Innovation: Proposes novel topo-metric map representation, effectively addressing limitations of traditional methods
  2. System Completeness: Complete solution from map construction to navigation execution
  3. Sufficient Experiments: Comprehensive verification across multiple scenarios and conditions
  4. Practical Value: System design considers actual deployment requirements with user-friendly interface

Weaknesses

  1. Insufficient Theoretical Analysis: Lacks theoretical guarantees on system convergence and stability
  2. Computational Complexity: Insufficient analysis of computational overhead for keyframe clustering and multi-target management
  3. Environmental Limitations: Primarily tested in indoor structured environments; outdoor complex environment adaptability unknown
  4. Limited Comparison Baselines: Mainly compared with classical BVTR method; lacks comparison with latest deep learning methods

Impact

  1. Academic Contribution: Provides new technical pathway for VTR navigation with certain theoretical value
  2. Practical Value: Method directly applicable to industrial and domestic robot navigation
  3. Reproducibility: Sufficiently detailed technical description facilitates reproduction and improvement

Applicable Scenarios

  1. Fixed-Route Navigation: Inter-station navigation in factories, path following for warehouse robots
  2. Environmental Change Scenarios: Long-term navigation tasks requiring adaptation to minor environmental changes
  3. Computationally Constrained Resources: Lower hardware requirements compared to deep learning methods

References

The paper includes 31 references covering important works in visual SLAM, robot navigation, place recognition, and related fields, providing solid theoretical foundation for the research.


Overall Assessment: This paper presents a practical VTR navigation solution with certain technical innovation and sufficient experimental validation. While there remains room for improvement in theoretical analysis and environmental adaptability, it provides valuable technical contributions to the mobile robot navigation field.