2025-11-21T19:10:17.554976

DELE: Deductive $\mathcal{EL}^{++}$ Embeddings for Knowledge Base Completion

Mashkova, Zhapa-Camacho, Hoehndorf

Ontology embeddings map classes, roles, and individuals in ontologies into $\mathbb{R}^n$, and within $\mathbb{R}^n$ similarity between entities can be computed or new axioms inferred. For ontologies in the Description Logic $\mathcal{EL}^{++}$, several optimization-based embedding methods have been developed that explicitly generate models of an ontology. However, these methods suffer from some limitations; they do not distinguish between statements that are unprovable and provably false, and therefore they may use entailed statements as negatives. Furthermore, they do not utilize the deductive closure of an ontology to identify statements that are inferred but not asserted. We evaluated a set of embedding methods for $\mathcal{EL}^{++}$ ontologies, incorporating several modifications that aim to make use of the ontology deductive closure. In particular, we designed novel negative losses that account both for the deductive closure and different types of negatives and formulated evaluation methods for knowledge base completion. We demonstrate that our embedding methods improve over the baseline ontology embedding in the task of knowledge base or ontology completion.

academic

DELE: Deductive $\mathcal{EL}^{++}$ Embeddings for Knowledge Base Completion

基本信息

论文ID: 2411.01574
标题: DELE: Deductive $\mathcal{EL}^{++}$ Embeddings for Knowledge Base Completion
作者: Olga Mashkova, Fernando Zhapa-Camacho, Robert Hoehndorf
机构: King Abdullah University of Science and Technology (KAUST)
分类: cs.AI
会议: NeSy 2024 Special Issue
论文链接: https://arxiv.org/abs/2411.01574

摘要

本文针对描述逻辑 $\mathcal{EL}^{++}$ 的本体嵌入方法在知识库补全任务中的局限性，提出了DELE（Deductive $\mathcal{EL}^{++}$ Embeddings）方法。现有的几何嵌入方法虽然能够显式生成本体模型，但存在两个关键问题：(1)无法区分不可证明的陈述和可证伪的陈述，可能将蕴含的陈述作为负样本；(2)未充分利用本体的演绎闭包来识别推断但未断言的陈述。本文通过设计新的负损失函数和评估方法，有效利用演绎闭包改进了知识库补全性能。

研究背景与动机

问题定义

本体嵌入旨在将本体中的类、角色和个体映射到 $\mathbb{R}^n$ 空间中，以便计算实体间相似性或推断新公理。对于 $\mathcal{EL}^{++}$ 描述逻辑，已有多种基于优化的几何嵌入方法，如ELEmbeddings、ELBE和Box2EL等。

现有方法的局限性

负样本选择问题：现有方法随机选择负样本时，可能将本体中蕴含的真实陈述误作为负例，影响模型训练质量
演绎闭包利用不足：未充分考虑本体的演绎闭包，即所有可推导出的陈述集合，导致无法有效区分已推断和未断言的知识
评估方法局限：现有评估方法主要来自知识图谱补全任务，未考虑本体中丰富的蕴含关系

研究动机

知识库补全是一个重要任务，需要预测应该添加到知识库中但尚未表示的公理。对于形式化知识库而言，这包括演绎推理（预测蕴含的公理）和归纳推理（预测新颖的非蕴含公理）两种类型。本文旨在通过更好地利用演绎闭包来改进几何嵌入方法。

核心贡献

提出了考虑演绎闭包的负损失函数：为所有 $\mathcal{EL}^{++}$ 标准形式设计了新的负损失函数，避免将蕴含陈述作为负样本
设计了快速近似演绎闭包计算算法：提出了计算 $\mathcal{EL}^{++}$ 理论演绎闭包的声音算法，用于改进训练过程中的负样本选择
制定了考虑演绎闭包的评估方法：针对知识库补全任务设计了新的评估指标，能够区分蕴含和非蕴含公理的预测性能
扩展了多种几何嵌入方法：将改进应用于ELEmbeddings、ELBE和Box2EL三种代表性方法，证明了通用性

方法详解

任务定义

知识库补全任务定义为：给定一个 $\mathcal{EL}^{++}$ 本体 $T$ ，预测应该添加到 $T$ 中的新公理。任务可进一步细分为：

演绎补全：预测在演绎闭包 $T^⊢$ 中但未在 $T$ 中显式断言的公理
归纳补全：预测不在演绎闭包中的新颖公理

演绎闭包计算

标准化形式

$\mathcal{EL}^{++}$ 公理可标准化为7种形式（见表1）：

GCI0: $A \sqsubseteq B$
GCI1: $A \sqcap B \sqsubseteq E$
GCI2: $A \sqsubseteq \exists r.B$
GCI3: $\exists r.A \sqsubseteq B$
GCI0-BOT: $A \sqsubseteq \perp$
GCI1-BOT: $A \sqcap B \sqsubseteq \perp$
GCI3-BOT: $\exists r.A \sqsubseteq \perp$

演绎闭包算法

本文提出两个算法来计算演绎闭包的近似：

算法1：基于本体中明确表示的公理，使用推理规则推导蕴含公理。例如：

A ⊓ B ⊑ E, A' ⊑ A, B' ⊑ B, E ⊑ E'
─────────────────────────────────────
         A' ⊓ B' ⊑ E'

算法2：基于任意概念和角色名，添加逻辑上必然成立的公理，如 $A \sqcap \perp \sqsubseteq E$ 。

负损失函数设计

ELEmbeddings负损失

对于球形嵌入，设计了6种新的负损失函数：

GCI0负损失（基于GCI1-BOT）： $\text{loss}_{A \not\sqsubseteq B}(a,b) = \max(0, r_\eta(a) + r_\eta(b) - \|f_\eta(a) - f_\eta(b)\| + \gamma)$
GCI1负损失： $\text{loss}_{A \sqcap B \not\sqsubseteq E}(a,b,e) = \max(0, -r_\eta(a) - r_\eta(b) + \|f_\eta(a) - f_\eta(b)\| - \gamma) + \text{其他项}$