2025-11-12T22:13:10.653124

SigSPARQL: Signals as a First-Class Citizen When Querying Knowledge Graphs

Schwarzinger, Steindl, Frühwirth et al.
Purpose: Cyber-Physical Systems (CPSs) integrate computation and physical processes, producing time series data from thousands of sensors. Knowledge graphs can contextualize these data, yet current approaches that are applicably to monitoring CPS rely on observation-based approaches. This limits the ability to express computations on sensor data, especially when no assumptions can be made about sampling synchronicity or sampling rates. Methodology: We propose an approach for integrating knowledge graphs with signals that model run-time sensor data as functions from time to data. To demonstrate this approach, we introduce SigSPARQL, a query language that can combine RDF data and signals. We assess its technical feasibility with a prototype and demonstrate its use in a typical CPS monitoring use case. Findings: Our approach enables queries to combine graph-based knowledge with signals, overcoming some key limits of observation-based methods. The developed prototype successfully demonstrated feasibility and applicability. Value: This work presents a query-based approach for CPS monitoring that integrates knowledge graphs and signals, alleviating problems of observation-based approaches. By leveraging system knowledge, it enables operators to run a single query across different system instances within the same domain. Future work will extend SigSPARQL with additional signal functions and evaluate it in large-scale CPS deployments.
academic

SigSPARQL: Signals as a First-Class Citizen When Querying Knowledge Graphs

Basic Information

  • Paper ID: 2506.03826
  • Title: SigSPARQL: Signals as a First-Class Citizen When Querying Knowledge Graphs
  • Authors: Tobias Schwarzinger, Gernot Steindl, Thomas Frühwirth, Thomas Preindl, Konrad Diwold, Katrin Ehrenmüller, Fajar J. Ekaputra
  • Category: cs.DB (Database)
  • Publication Date: July 2025
  • Paper Link: https://arxiv.org/abs/2506.03826

Abstract

This paper proposes a novel approach combining knowledge graphs with signal processing to address data querying challenges in Cyber-Physical Systems (CPS) monitoring. Traditional observation-based methods have limitations in handling sensor data computations, particularly when dealing with asynchronous or inconsistent sampling rates. The authors introduce SigSPARQL, a query language that models runtime sensor data as time-to-data functions (signals), enabling unified querying of RDF data and signals. The technical feasibility is validated through a prototype system, and practical effectiveness is demonstrated in typical CPS monitoring use cases.

Research Background and Motivation

Problem Definition

  1. Core Problem: Cyber-Physical Systems generate large volumes of time-series sensor data that must be analyzed in conjunction with system context information. However, existing observation-based methods exhibit complexity and limitations in expressing sensor data computations.
  2. Significance: With ongoing digitalization, CPS are widely deployed in buildings, energy networks, manufacturing, and other domains. Effective utilization of sensor data is critical for system analysis, monitoring, and control.
  3. Limitations of Existing Methods:
    • Ontology-Based Data Access (OBDA) methods scatter temporal values from a single sensor across thousands of elements representing independent observations
    • Query complexity increases, requiring reconstruction of temporal value concepts from independent observations
    • Challenges in handling asynchronous time series, where observations with different timestamps are difficult to combine
    • Lack of unified conceptual modeling between observation values
  4. Research Motivation: Introduce the Signal concept as a "first-class citizen" to abstract independent observations and overcome limitations of current methods in expressing sensor data computations.

Core Contributions

  1. Proposed a language-agnostic framework: Defined three core operators (Signal, ApplySF, LiftVal) for integrating knowledge graph query languages with signal processing
  2. Designed the SigSPARQL query language: Extended SPARQL syntax and semantics to support signals as first-class citizens in query results
  3. Constructed a prototype system: Implemented based on Oxigraph, validating technical feasibility
  4. Provided comprehensive theoretical foundations: Based on Functional Reactive Programming (FRP) theory, establishing formal relationships between signals and time-series data
  5. Demonstrated practical value: Proved applicability through an electric vehicle charging station monitoring use case

Methodology Details

Task Definition

Input: Signal-annotated RDF dataset <D, S, φ>, where D is an RDF dataset, S is a set of RDF signals, and φ is a signal annotation function Output: Time-stamped solution sequences (TSS) or continuously updated RDF graphs Constraints: Support continuous queries and handle asynchronous sensor data streams

Core Concepts and Data Model

1. Signal Definition

Definition 7.1: An RDF signal ψ is a (possibly partial) function of the form T→RDF
where T is the time domain and RDF is the set of RDF terms

2. Signal-Annotated RDF Dataset

Definition 7.2: <D, S, φ>
- D: Conventional RDF dataset
- S: Set of RDF signals
- φ: Partial function IRI×IRI→S (signal annotation function)

Language-Agnostic Framework

The authors propose three core operators:

  1. Signal(s, p): Constructs a signal based on signal source s and signal property p
  2. ApplySF(f, a⃗): Applies an n-ary signal function f to a signal parameter list of length n
  3. LiftVal(v): Lifts value v to a constant signal

SigSPARQL Language Design

Syntax Extensions

  1. SIGNALS Clause:
SIGNALS {
    ev:ActivePower FROM ?device AS ?ap
    ev:Envelope FROM ?garage AS ?env
}
  1. WHEN Clause:
WHEN {
    SUM(?ap * ?sign) > ?env
    BECOMES TRUE AT ?violation_time
}

Semantic Definition

  1. Time-Stamped Solution Sequences (TSS): Allow solutions to bind variables to RDF terms or RDF signals, evaluable at time point τ
  2. Continuous Queries: SELECT queries return TSS, CONSTRUCT queries return continuously growing RDF graphs
  3. Signal Computation: Lifts SPARQL functions and operators pointwise to the signal domain

Technical Innovations

  1. Signal Abstraction: Replaces observation-based methods with FRP signal concepts, providing more natural temporal data modeling
  2. Unified Query Model: Combines graph structure knowledge and temporal signal processing in a single query
  3. Extended Type System: Extends SPARQL algebra to support signal types with automatic type lifting
  4. Continuous Query Semantics: Defines event triggering mechanisms supporting real-time monitoring applications

Experimental Setup

Prototype Implementation

  • Base Framework: Built on Oxigraph graph database
  • Temporal Model: Discrete time using "last observation" strategy for inter-observation data modeling
  • Evaluation Approach: Two-step evaluation—constructing signal computation descriptions, then registering with continuous query engine

Validation Use Case

Electric Vehicle Charging Station Monitoring Scenario:

  • System Components: Multiple EV chargers, photovoltaic systems, batteries
  • Monitoring Objective: Detect power consumption violations exceeding operational envelope limits
  • Data Sources: Active Power (AP) sensors, State of Charge (SoC) sensors, operational envelope limits

Query Example

CONSTRUCT {
    ?garage ev:hasEnvelopeViolation [
        ev:description "Envelope Violated!" ;
        ev:startTime ?violation_time
    ]
}
WHEN {
    SUM(?ap * ?sign) > ?env
    BECOMES TRUE AT ?violation_time
}
SIGNALS {
    ev:ActivePower FROM ?device AS ?ap
    ev:Envelope FROM ?garage AS ?env
}
WHERE {
    ?garage a ev:Garage ; sosa:hosts ?device .
    ?device a ?ap_device_type .
    BIND(IF(?ap_device_type = ev:PVSystem, -1, 1) AS ?sign)
}
GROUP BY ?garage

Experimental Results

Technical Feasibility Validation

  1. Successful Prototype Implementation: Complete implementation of SigSPARQL syntax and semantics
  2. Query Execution: Supports continuous SELECT queries (returning TSS) and CONSTRUCT queries (returning continuously updated RDF graphs)
  3. Signal Processing: Successfully handles signal acquisition, computation, and event detection

Application Effectiveness

  1. Unified Query Capability: Single queries applicable to different system instances within the same domain
  2. Real-Time Monitoring: Effectively detects operational envelope violation events
  3. Context-Aware Processing: Leverages system knowledge provided by knowledge graphs to enhance query expressiveness

Functional Validation

  • Successfully implemented unified handling of asynchronous sensor data
  • Supports complex signal computations (summation, comparison, etc.)
  • Implements event triggering mechanisms and timestamp binding
  • Validates correctness of continuous queries

RDF Stream Processing

  1. Window-Based Approaches (C-SPARQL, RSP-QL, etc.): Partition unbounded streams into bounded relations
  2. CEP-Inspired Systems (EP-SPARQL, etc.): Detect patterns in event streams

Temporal Data Querying and OBDA

  1. Chrontext: Rewrites SPARQL queries to time-series database queries
  2. Ontop-temporal: Extends temporal logic formula query capabilities
  3. STARQL: Comprehensive approach supporting continuous and historical queries

Graph and Time-Series Integration

  1. Bollen et al.'s Approach: Extends graph matching to support measurements and time-series patterns
  2. Hybrid Data Model Research: Fusion of graph and time-series data

Differentiation Advantage of This Work: Models temporal values based on signals, leveraging FRP theory to address limitations of observation-based methods

Conclusions and Discussion

Main Conclusions

  1. Signals as first-class citizens effectively addresses limitations of traditional observation-based methods
  2. SigSPARQL provides a unified query interface for knowledge graphs and signal processing
  3. Technical feasibility is validated through the prototype system
  4. Practical value is demonstrated in CPS monitoring scenarios

Limitations

  1. Signal Type Constraints: Supporting all possible signal types is complex; current prototype only supports "last observation" strategy
  2. Expressiveness Limitations: Cannot express complex temporal window computations like "average over past 10 minutes"
  3. Missing Performance Analysis: Lacks detailed performance evaluation
  4. Insufficient Scale Validation: Lacks validation in large-scale CPS deployments

Future Directions

  1. Performance Optimization: Design optimized prototypes for performance evaluation and large-scale monitoring use cases
  2. Functional Extensions: Add advanced signal processing functions (e.g., integration operations)
  3. User Evaluation: Assess usability advantages of the language
  4. Temporal Knowledge Graphs: Extend query language semantics to support temporal knowledge graphs
  5. Practical Deployment: Investigate requirements of real-world CPS deployments

In-Depth Evaluation

Strengths

  1. Solid Theoretical Foundation: Based on FRP theory with rigorous mathematical definitions and semantics
  2. Clear Problem Definition: Accurately identifies core issues of existing methods and proposes targeted solutions
  3. Reasonable Design: Language extensions maintain SPARQL compatibility with low learning costs
  4. Complete Implementation: Forms a complete chain from theory to prototype to application
  5. Strong Innovation: First to introduce FRP signal concepts into knowledge graph querying

Weaknesses

  1. Limited Evaluation: Lacks quantitative comparisons with existing methods and large-scale validation
  2. Restricted Functionality: Relatively simple signal function library with limited complex temporal analysis capabilities
  3. Unknown Performance: No performance benchmarks or optimization analysis
  4. Limited Application Scope: Primarily targets CPS monitoring; applicability to other domains remains uncertain

Impact

  1. Academic Contribution: Provides new perspectives for integrating knowledge graphs with temporal data
  2. Practical Value: Broad application prospects in IoT, Industry 4.0, and related domains
  3. Technology Advancement: May drive further development of SPARQL standards
  4. Cross-Domain Fusion: Promotes interdisciplinary collaboration between databases, semantic web, and functional programming

Applicable Scenarios

  1. Industrial Monitoring: Real-time monitoring of manufacturing systems and energy networks
  2. Smart Buildings: Building equipment status monitoring and control
  3. Intelligent Transportation: Traffic flow and vehicle status monitoring
  4. Environmental Monitoring: Large-scale sensor network data analysis

References

The paper cites 36 relevant references covering key works in RDF stream processing, temporal data querying, and functional reactive programming, providing solid theoretical foundations and technical background for this research.


Overall Assessment: This is a high-quality database systems research paper making important contributions to knowledge graph query language extensions. It features solid theoretical foundations, reasonable technical solutions, and relatively complete implementation. While there is room for improvement in evaluation and performance aspects, it provides valuable new directions for related field development.