2025-11-10T02:56:05.378036

Implementing SIAv2 Over Rubin Observatory's Data Butler

Jenness, Voutsinas, Dubois-Felsmann et al.
The IVOA Simple Image Access version 2 protocol defines an easy way to provide community access to a collection of data. At the Vera C. Rubin Observatory we currently enable ObsTAP access to our data holdings via an ObsCore export or view of our Data Butler repositories. This approach comes with some deployment constraints, such as requiring pgsphere and compatibility with our CADC TAP implementation, so recently we decided to see whether we could instead provide an SIAv2 service that talks directly to our Data Butler. Here we describe our motivation, implementation strategies, and current deployment status, as well as discussing some metadata mismatches between the Butler data models and SIAv2.
academic

Implementing SIAv2 Over Rubin Observatory's Data Butler

Basic Information

  • Paper ID: 2501.00544
  • Title: Implementing SIAv2 Over Rubin Observatory's Data Butler
  • Authors: Tim Jenness, Stelios Voutsinas, Gregory P. Dubois-Felsmann, Andrei Salnikov
  • Classification: astro-ph.IM (Astrophysics - Instrumentation and Methods)
  • Publication Date: December 31, 2024
  • Paper Link: https://arxiv.org/abs/2501.00544

Abstract

The IVOA Simple Image Access Protocol version 2 (SIAv2) defines a straightforward method for providing community access to data collections. At Vera C. Rubin Observatory, we currently implement ObsTAP data access through ObsCore exports or views from the Data Butler repository. However, this approach has deployment constraints, such as requiring pgsphere support and compatibility with CADC TAP implementations. Consequently, we explored whether we could provide an SIAv2 service that communicates directly with the Data Butler. This paper describes our motivation, implementation strategy, current deployment status, and several metadata mismatch issues between the Butler data model and SIAv2.

Research Background and Motivation

Problem Background

Rubin Observatory's Data Butler system consists of a metadata registry and file data storage, with the registry containing sufficient information to construct ObsCore records. Previously, two approaches were used to provide ObsCore tables:

  1. Export records as CSV or Parquet files and load them into a static database
  2. Use registry backend hooks to provide real-time synchronization to ObsCore tables

Limitations of Existing Approaches

  1. Static Export Method: Suitable for formal data releases and integration into high-performance Qserv databases, but unsuitable for dynamic datasets such as nightly rapid products
  2. Real-time ObsCore Method: Requires deployment environment support for pgsphere, and requires rebuilding the entire table when configuration changes

Research Motivation

These limitations prompted the research team to seek a simpler yet standardized query layer based directly on the Butler system. The IVOA SIAv2 protocol became the obvious choice because:

  • Direct Butler interface provides greater flexibility
  • Configuration changes only require simple service restart
  • Can immediately work with any Butler repository

Core Contributions

  1. Designed and implemented a direct SIAv2-to-Butler interface: Bypassing the intermediate ObsCore table layer
  2. Developed a layered architecture: Separating the service layer from SIAv2 query processing, improving testability
  3. Created the dax_obscore library: Providing a command-line interface for user learning and experimentation
  4. Deployed a production-ready service: Already deployed on the Rubin Science Platform and available for debugging data
  5. Identified and analyzed data model mismatch issues: Providing a clear roadmap for future improvements

Methodology Details

Task Definition

Map IVOA SIAv2 protocol queries directly to the Rubin Data Butler query system, implementing a standardized astronomical data access interface while avoiding deployment constraints of traditional ObsCore table methods.

System Architecture

HTTP GET → Nginx → SIAv2 Service → dax_obscore → Butler Repo
sia/dp02/query?POS=..     ↓              ↓            ↓
                    Query Processing  Butler Query  Results
                         ↓              ↓            ↓
                    ObsCore VOTable ← Results ← DatasetRefs

Core Component Design

  1. SIAv2 Service Layer
    • Developed using Python and FastAPI
    • Based on Rubin's standard internal development platform Phalanx
    • Provides standard authentication layer and deployment capabilities
    • Processes raw SIAv2 parameters and encapsulates returned results
  2. dax_obscore Library
    • Parses SIAv2 parameters
    • Converts parameters to Butler queries
    • Executes queries and returns standardized results
    • Generates output in Astropy VOTable format
    • Uses Felis data model to define table structure for consistency
  3. Butler Interface Compatibility
    • Transparently supports both native "direct" Butler and new client/server remote Butler
    • Leverages Butler's native spatial and temporal query support

Technical Innovations

  1. Layered Design Advantages
    • Separation of service layer from query processing improves testability
    • dax_obscore can be installed and used independently
    • Supports parallel development and maintenance
  2. Direct Butler Access
    • Bypasses the intermediate ObsCore table layer
    • Reduces deployment dependencies (no pgsphere required)
    • Faster response to configuration changes
  3. Standardized Output
    • Uses Felis data model to ensure result consistency
    • Complies with IVOA standard VOTable format
    • Supports standard SIAv2 parameter set

Experimental Setup

Supported Query Parameters

The dax_obscore package currently supports the following SIAv2 query parameters:

  • MAXREC: Maximum record limit
  • INSTRUMENT: Instrument filtering
  • POS: Position/spatial query
  • TIME: Time range query
  • BAND: Wavelength band filtering
  • EXPTIME: Exposure time
  • CALIB: Calibration type

Planned Parameters

  • ID: Identifier query
  • TARGET: Target object
  • FACILITY: Facility name (planned to use "Rubin:Simonyi" and "Rubin:1.2m")
  • COLLECTION: Dataset collection

Deployment Environment

  • Deployed on the Rubin Science Platform
  • Available for debugging data access
  • Supports command-line tools installable via PyPI

Experimental Results

Current Deployment Status

  1. Service Availability: Successfully deployed and operational on the Rubin Science Platform
  2. Functional Verification: Core SIAv2 parameter query functionality working properly
  3. Compatibility: Supports both direct Butler and remote Butler access modes
  4. User Tools: Provides command-line interface for local experimentation and learning

Performance Advantages

  1. Simplified Deployment: No pgsphere dependency required
  2. Configuration Flexibility: Changes only require service restart
  3. Immediate Availability: Can work with any Butler repository immediately

IVOA Standard Protocols

  • SIAv2 Protocol: IVOA recommended standard defined by Dowler et al. in 2015
  • ObsTAP Service: Table access protocol based on ObsCore, standardized by Louys et al. in 2017

Rubin Observatory Technology Stack

  • Data Butler System: Data management system developed by Jenness et al. in 2022
  • Qserv Database: High-performance distributed database developed by Mueller et al. in 2023
  • Remote Butler: Client/server architecture developed by Jenness et al. in 2024

Conclusions and Discussion

Main Conclusions

  1. Implementation Feasibility: Implementing SIAv2 over the Data Butler is a relatively straightforward process
  2. Architectural Advantages: Layered development strategy enables parallel development and provides additional command-line tools
  3. Successful Deployment: Service has been successfully deployed and is available for production use

Data Model Mismatch Issues

1. Missing Instrument Information for Co-adds

  • Problem: Co-added stacks in the Butler registry lack associated instrument information
  • Impact: Cannot distinguish data sources in repositories containing both LATISS and LSSTCam data
  • Solution: Future determination of original dataset instruments through complete provenance tracking

2. Exposure Time for Co-adds

  • Problem: Median exposure time for co-adds is a derived quantity unknown when Butler coordinate space is defined
  • Solution: Planned support for derived metadata storage in future development roadmap

3. Observation Date for Co-adds

  • Problem: Co-adds lose date information from individual observations
  • Solution: Date ranges may be derivable after future Butler provenance system implementation

4. Dataset Type Standardization

  • Problem: Butler dataset types (e.g., visit_image, difference_image) lack standardized query methods in SIAv2
  • Solution: Consider adding DPSUBTYPE query parameter extension, possibly using lsst prefix

Future Directions

  1. Derived Metadata Support: Implement query support for computed metadata
  2. Complete Provenance System: Resolve co-add metadata deficiencies through provenance information
  3. Extended Parameter Support: Complete implementation of ID, TARGET, FACILITY, and COLLECTION parameters
  4. Custom Extensions: Implement Rubin-specific query parameters such as DPSUBTYPE

In-Depth Evaluation

Strengths

  1. Excellent Architecture Design
    • Layered design improves system maintainability and testability
    • Direct Butler interface avoids intermediate layer complexity
    • Supports multiple Butler deployment modes
  2. High Practical Value
    • Addresses specific deployment problems (pgsphere dependency, configuration flexibility)
    • Provides standardized data access interface
    • Command-line tools increase system usability
  3. Standard Compliance
    • Strictly adheres to IVOA SIAv2 standard
    • Uses standard VOTable format for output
    • Compatible with existing astronomical data access ecosystem

Limitations

  1. Data Model Constraints
    • Multiple important metadata mismatch issues remain unresolved
    • Query capabilities for co-adds are limited
    • Requires further development of the Butler system
  2. Feature Completeness
    • Some SIAv2 parameters not yet implemented
    • Custom extensions still in planning phase
    • Support for complex queries may be limited
  3. Documentation Depth
    • Performance benchmark data missing
    • Insufficient discussion of error handling and edge cases
    • Limited detailed comparative analysis with other systems

Impact

  1. Contribution to Astronomical Data Management
    • Provides a practical case study of standardized data access for large astronomical survey projects
    • Demonstrates how to implement traditional protocols on modern data management systems
    • Provides reference for similar implementations at other observatories
  2. Technology Promotion Value
    • Open-source implementation (dax_obscore package) facilitates community adoption and improvement
    • Layered architecture design applicable to similar projects
    • Command-line tools reduce user learning curve

Applicable Scenarios

  1. Large Astronomical Survey Projects: Projects requiring standardized data access interfaces
  2. Data Centers and Observatories: Institutions wishing to provide IVOA-compliant services
  3. Research Community: Researchers needing programmatic access to astronomical data
  4. Educational Use: Learning and experimental environments for SIAv2 protocol

References

This paper cites the following key literature:

  1. Dowler, P., et al. (2015). IVOA Simple Image Access Version 2.0 - Defines the SIAv2 standard protocol
  2. Jenness, T., et al. (2022). Core architecture paper of the Rubin Data Butler system
  3. Louys, M., et al. (2017). ObsCore data model and TAP implementation standards
  4. Salnikov, A. (2022). Technical note on ObsCore as a Butler registry view

Summary: This paper demonstrates a successful engineering practice case that solves practical deployment problems while maintaining compatibility with international standards. Although there are challenges with data model mismatches, the overall implementation provides valuable reference and tools for the astronomical data management field.