2025-11-22T08:40:16.236203

UniVector: Unified Vector Extraction via Instance-Geometry Interaction

Yan, Yue, Xia et al.

Vector extraction retrieves structured vector geometry from raster images, offering high-fidelity representation and broad applicability. Existing methods, however, are usually tailored to a single vector type (e.g., polygons, polylines, line segments), requiring separate models for different structures. This stems from treating instance attributes (category, structure) and geometric attributes (point coordinates, connections) independently, limiting the ability to capture complex structures. Inspired by the human brain's simultaneous use of semantic and spatial interactions in visual perception, we propose UniVector, a unified VE framework that leverages instance-geometry interaction to extract multiple vector types within a single model. UniVector encodes vectors as structured queries containing both instance- and geometry-level information, and iteratively updates them through an interaction module for cross-level context exchange. A dynamic shape constraint further refines global structures and key points. To benchmark multi-structure scenarios, we introduce the Multi-Vector dataset with diverse polygons, polylines, and line segments. Experiments show UniVector sets a new state of the art on both single- and multi-structure VE tasks. Code and dataset will be released at https://github.com/yyyyll0ss/UniVector.

academic

UniVector: Unified Vector Extraction via Instance-Geometry Interaction

基本信息

论文ID: 2510.13234
标题: UniVector: Unified Vector Extraction via Instance-Geometry Interaction
作者: Yinglong Yan, Jun Yue, Shaobo Xia, Hanmeng Sun, Tianxu Ying, Chengcheng Wu, Sifan Lan, Min He, Pedram Ghamisi, Leyuan Fang
分类: cs.CV (Computer Vision)
发表时间: 2025年10月15日 (arXiv预印本)
论文链接: https://arxiv.org/abs/2510.13234v1

摘要

向量提取(Vector Extraction, VE)从栅格图像中检索结构化向量几何信息，提供高保真度表示和广泛适用性。然而，现有方法通常针对单一向量类型(如多边形、折线、线段)定制，需要为不同结构使用独立模型。这源于将实例属性(类别、结构)和几何属性(点坐标、连接)独立处理，限制了捕获复杂结构的能力。受人脑在视觉感知中同时使用语义和空间交互的启发，作者提出UniVector，一个统一的VE框架，通过实例-几何交互在单一模型内提取多种向量类型。UniVector将向量编码为包含实例级和几何级信息的结构化查询，通过交互模块迭代更新以实现跨级别上下文交换。动态形状约束进一步细化全局结构和关键点。

研究背景与动机

问题定义

向量提取是计算机视觉中的核心任务，旨在从栅格图像中提取结构化向量信息。向量数据相比栅格数据具有轻量存储、高保真度和易编辑性的优势，广泛应用于图形设计、地理制图和自动驾驶等领域。

现有方法的局限性

单一结构限制: 现有方法通常专门针对特定向量类型(多边形、折线或线段)设计，需要多个独立模型
级联架构问题: 传统方法采用级联管道，将实例属性和几何属性分别处理，导致信息缺口
拓扑错误: 缺乏实例级约束容易在多结构场景中产生拓扑错误

研究动机

受人脑在视觉感知中同时使用语义理解和空间理解的启发，作者提出通过实例-几何交互来建模显式的跨级别信息融合，使全局结构先验和精细语义-结构线索能够相互补充。

核心贡献

统一表示与框架: 提出结构化查询表示来统一不同向量结构，并引入UniVector实例-几何交互学习框架
实例-几何交互建模: 设计统一向量编码器和实例-几何交互解码器，自适应初始化和细化结构化查询
动态形状约束(DSC): 引入DSC动态优化全局结构一致性和局部形状精度
Multi-Vector数据集: 构建首个多结构VE数据集，包含多边形、折线和线段

方法详解

任务定义

给定栅格图像，同时提取其中的多种向量结构(多边形、折线、线段)，输出包括实例类别、边界框、点坐标和点类别。

模型架构

1. 整体框架

UniVector框架包含三个主要组件：

统一向量编码: 将不同向量结构编码为结构化查询
实例-几何交互解码: 迭代细化查询
动态形状约束: 确保全局结构一致性和局部几何精度

2. 统一向量编码

结构化查询表示:

查询集 $Q_s \in \mathbb{R}^{N \times (M+1) \times C}$ ，其中N为最大向量实例数，M为每个向量的最大点数，C为通道维度
每个向量 $Q_s^i$ 包含实例查询 $Q_{ins}^i \in \mathbb{R}^C$ 和几何查询 $Q_{geo}^i \in \mathbb{R}^{M \times C}$

查询编码过程:

实例级编码：采用粗到细策略，先选择得分最高的图像token形成粗糙查询，然后通过实例检测模块细化
几何级编码：通过形状变形模块捕获详细结构，使用帧内注意力细化几何查询

3. 实例-几何交互解码

结构化特征提取: 扩展可变形注意力，为每个向量分配实例参考点和几何参考点：

$\begin{cases} R_{geo}^l = \text{Sigmoid}(\text{Sigmoid}^{-1}(R_{ins}^l) + \text{MLP}(Q_{geo}^l)), & l = 0 \\ R_{geo}^l = \text{Sigmoid}(\text{Sigmoid}^{-1}(R_{geo}^l) + \text{MLP}(Q_{geo}^l)), & l \geq 1 \end{cases}$

实例-几何交互:

单级别交互：使用自注意力机制
跨级别细化：使用交叉注意力机制

$Q_{ins}^{''} = \text{Concat}(\text{CA}(Q_{ins}^{i'}, Q_{geo}^{i'}), i \in [1, ..., N])$ $Q_{geo}^{''} = \text{Concat}(\text{CA}(Q_{geo}^{i'}, Q_{ins}^{i'}), i \in [1, ..., N])$