2025-11-13T13:49:10.872331

Robust Visual Teach-and-Repeat Navigation with Flexible Topo-metric Graph Map Representation

Wang, Cheng, Wang et al.

Visual Teach-and-Repeat Navigation is a direct solution for mobile robot to be deployed in unknown environments. However, robust trajectory repeat navigation still remains challenged due to environmental changing and dynamic objects. In this paper, we propose a novel visual teach-and-repeat navigation system, which consists of a flexible map representation, robust map matching and a map-less local navigation module. During the teaching process, the recorded keyframes are formulated as a topo-metric graph and each node can be further extended to save new observations. Such representation also alleviates the requirement of globally consistent mapping. To enhance the place recognition performance during repeating process, instead of using frame-to-frame matching, we firstly implement keyframe clustering to aggregate similar connected keyframes into local map and perform place recognition based on visual frame-tolocal map matching strategy. To promote the local goal persistent tracking performance, a long-term goal management algorithm is constructed, which can avoid the robot getting lost due to environmental changes or obstacle occlusion. To achieve the goal without map, a local trajectory-control candidate optimization algorithm is proposed. Extensively experiments are conducted on our mobile platform. The results demonstrate that our system is superior to the baselines in terms of robustness and effectiveness.

academic

Robust Visual Teach-and-Repeat Navigation with Flexible Topo-metric Graph Map Representation

Basic Information

Paper ID: 2510.09089
Title: Robust Visual Teach-and-Repeat Navigation with Flexible Topo-metric Graph Map Representation
Authors: Jikai Wang, Yunqi Cheng, Kezhi Wang, and Zonghai Chen (University of Science and Technology of China)
Category: cs.RO (Robotics)
Publication Date: October 10, 2025
Paper Link: https://arxiv.org/abs/2510.09089

Abstract

This paper proposes a novel visual teach-and-repeat (VTR) navigation system that addresses challenges posed by environmental changes and dynamic objects through flexible map representation, robust map matching, and mapless local navigation modules. The system employs a topo-metric graph structure to store keyframes, supporting node expansion to preserve new observations. Position recognition performance is enhanced through keyframe clustering and frame-to-local-map matching strategies, while a long-term goal management algorithm is constructed to prevent the robot from becoming lost due to environmental changes or obstacle occlusion.

Research Background and Motivation

Problem Definition

Visual teach-and-repeat (VTR) navigation serves as a direct solution for deploying mobile robots in unknown environments. However, achieving robust trajectory repetition navigation in the presence of environmental changes and dynamic objects remains challenging.

Significance

Practical Value: VTR navigation avoids complete mapping of task environments, enabling more efficient robot deployment
Application Demand: Widespread demand in fixed-route navigation scenarios (e.g., navigation between factory stations)
Technical Challenge: Maintaining navigation robustness under environmental changes, dynamic objects, and path deviations

Limitations of Existing Methods

Map Representation Issues: Traditional methods rely on globally consistent mapping with high localization accuracy requirements
Fragile Place Recognition: Frame-to-frame matching lacks robustness under viewpoint changes and occlusions
Navigation Module Dependency: Existing systems over-rely on accurate place recognition, failing easily when matching fails
Poor Environmental Adaptability: Difficulty handling environmental changes and dynamic obstacles

Core Contributions

Proposed Flexible Map Representation: Designed a topo-metric graph structure that adapts to environmental changes and odometry drift errors
Constructed Robust VTR Navigation System: Capable of adapting to environmental changes, dynamic objects, and viewpoint occlusions; navigation module can be embedded in other VTR systems
Implemented User-Friendly System: Easy adaptation to new task environments with good practical utility
Verified System Effectiveness: Extensive experiments on mobile platforms demonstrate superiority over baseline methods

Methodology Details

Task Definition

VTR navigation comprises two phases:

Teaching Phase: Manual operation of the robot along the task route, real-time recording of visual frames as the map
Repeat Phase: Robot attempts to match current visual frames with the map and updates the next target upon successful matching

System Architecture

1. Map Representation Error Analysis

Traditional SLAM map representation:

M̂ = {[Ki, T̂WI], i = 1, ···, N}

where estimated global poses contain cumulative drift errors. The proposed representation:

M̄ = {[Ki, T̂ij], i, j = 1, ···, N}

Each keyframe stores only reliable relative pose transformations with neighboring keyframes.

2. Topo-metric Keyframe Map

Keyframe definition:

Ki = {Ti-1i, Ui, Pi, Ii}

containing relative transformation, 2D feature points, 3D position, and image information. Upon loop closure detection, extended to:

Ki = {Ti-1i, Ui, Pi, Ii, TL(i)i, L(i)}

3. Map Redundancy Reduction

Merging similar frames through keyframe clustering:

Compute DBoW similarity; stop below threshold
Transform 3D feature points of similar keyframes to retained frame coordinate system
Remove redundant keyframes while maintaining linked list structure

Visual Repeat Phase

1. Frame-to-Keyframe Matching

Employing constrained search strategy:

Rn = {[u,v]T | ||[u,v]T - [un,vn]T||2 < γ}

Search for corresponding features within circular regions; solve relative pose via PnP.

2. Map Extension

When the robot deviates from the teaching route, add new observations to the map:

Ki = {Ti-1i, Ūi, P̄i, Ii, TL(i)i, L(i), TiS(i), S(i), {K}}

3. Goal List Management

Construct goal lists rather than single targets:

Tkg0 = inv(Tik) · TiS(i)
Tkg1 = Tkg0 · TS(i)S(S(i))

Goal list Lg = {tg0, tg1, ···, tgM} updates upon successful matching.

4. Local Motion Planning

Implement multi-target tracking through trajectory candidate scoring:

si = (1/3) Σ(m=0 to 2) (1 - (0.005 · Θ(tie - x, tgm - x))^(1/2))

Score considering the first three targets; select optimal trajectory.

Experimental Setup

Mobile Platform Configuration

Hardware: Differential drive platform equipped with IMU embedded camera (MYNTEYE-SC) and LiDAR (Livox Mid-360)
Localization System: OpenVINS for visual odometry; iG-LIO records trajectories for evaluation

Evaluation Metrics

End-point Distance: Distance between actual arrival point and preset teaching route endpoint
Success Rate: Whether the robot can navigate from start to endpoint (strict route following not required)

Datasets

Environments: Office and corridor scenes
Route Types: Straight and curved paths
Test Conditions: Normal state, obstacle occlusion, environmental changes

Comparison Methods

BVTR: Classical bio-inspired VTR method
Ablation Studies: Variants without keyframe clustering, single-target tracking, etc.

Experimental Results

Main Results

1. Navigation Under Normal Conditions

Office Scene: Proposed method end-point distance 0.08m, BVTR 0.10m
Both methods successfully complete navigation with slight deviations at turns

2. Obstacle Occlusion Testing

Proposed Method: End-point distance 0.08m, successfully avoids obstacles and returns to teaching route
BVTR: End-point distance 5.58m, stops before obstacle unable to continue
Single-Target Version: End-point distance 5.20m, validating importance of multi-target strategy

3. Curved Path Navigation (Corridor Scene)

Proposed Method: End-point distance 0.37m, successfully follows entire route
BVTR: End-point distance 11.44m, stops after navigating to unknown location
Without Keyframe Clustering: End-point distance 10.49m, demonstrating critical role of clustering strategy

4. Keyframe Clustering Verification

Keyframe clustering significantly increases loop closure detection density, particularly at turns, providing timely feedback to the motion planning module.

5. Map Extension Verification

System successfully adds new environmental information during repeat phase; extended keyframes maintain association with original map without disrupting topological structure.

Experimental Findings

Long-term Goal Management: Multi-target strategy significantly improves system robustness to loop closure detection failures
Keyframe Clustering: Critical for robust matching in texture-poor environments
Map Extension: Effectively handles environmental changes, supporting long-term navigation tasks

Main Research Directions

Bio-inspired Methods: Direct image comparison and pattern recognition
Visual Geometric Methods: Feature-based image matching and PnP solving
Deep Learning Methods: End-to-end learning and neural network matching
Topo-metric Fusion: Navigation combining topological and metric information

Advantages of This Work

Compared to bio-inspired methods: More robust feature matching
Compared to deep learning methods: High computational efficiency, strong interpretability
Compared to traditional geometric methods: No global consistency requirement, strong adaptability

Conclusions and Discussion

Main Conclusions

Flexible Map Representation: Topo-metric graphs effectively mitigate global mapping requirements
Robust Navigation System: Multi-target management and keyframe clustering significantly enhance system robustness
Practical Verification: System effectiveness validated across multiple challenging scenarios

Limitations

Relative Pose Dependency: System performance depends on accuracy of relative poses between keyframes
Long-term Drift: Prolonged map matching failures may cause odometry drift divergence
Environmental Assumptions: Assumes sufficiently accurate relative pose estimation, which may not hold in certain environments

Future Directions

Develop end-to-end visual navigation models based on deep learning to further reduce dependence on accurate global pose tracking and environmental mapping.

In-Depth Evaluation

Strengths

Technical Innovation: Proposes novel topo-metric map representation, effectively addressing limitations of traditional methods
System Completeness: Complete solution from map construction to navigation execution
Sufficient Experiments: Comprehensive verification across multiple scenarios and conditions
Practical Value: System design considers actual deployment requirements with user-friendly interface

Weaknesses

Insufficient Theoretical Analysis: Lacks theoretical guarantees on system convergence and stability
Computational Complexity: Insufficient analysis of computational overhead for keyframe clustering and multi-target management
Environmental Limitations: Primarily tested in indoor structured environments; outdoor complex environment adaptability unknown
Limited Comparison Baselines: Mainly compared with classical BVTR method; lacks comparison with latest deep learning methods

Impact

Academic Contribution: Provides new technical pathway for VTR navigation with certain theoretical value
Practical Value: Method directly applicable to industrial and domestic robot navigation
Reproducibility: Sufficiently detailed technical description facilitates reproduction and improvement

Applicable Scenarios

Fixed-Route Navigation: Inter-station navigation in factories, path following for warehouse robots
Environmental Change Scenarios: Long-term navigation tasks requiring adaptation to minor environmental changes
Computationally Constrained Resources: Lower hardware requirements compared to deep learning methods

References

The paper includes 31 references covering important works in visual SLAM, robot navigation, place recognition, and related fields, providing solid theoretical foundation for the research.

Overall Assessment: This paper presents a practical VTR navigation solution with certain technical innovation and sufficient experimental validation. While there remains room for improvement in theoretical analysis and environmental adaptability, it provides valuable technical contributions to the mobile robot navigation field.