Generating CodeMeta using declarative mapping rules: An open-ended approach using ShExML
GarcÃa-González
Nowadays, software is one of the cornerstones when conducting research in several scientific fields which employ computer-based methodologies to answer new research questions. However, for these experiments to be completely reproducible, research software should comply with the FAIR principles, yet its metadata can be represented following different data models and spread across different locations. In order to bring some cohesion to the field, CodeMeta was proposed as a vocabulary to represent research software metadata in a unified and standardised manner. While existing tools can help users to generate CodeMeta files for some specific use cases, they fall short on flexibility and adaptability. Hence, in this work, I propose the use of declarative mapping rules to generate CodeMeta files, illustrated through the implementation of three crosswalks in ShExML which are then expanded and merged to cover the generation of CodeMeta files for two existing research software artefacts. Moreover, the outputs are validated using SHACL and ShEx and the whole generation workflow is automated requiring minimal user intervention upon a new version release. This work can, therefore, be used as an example upon which other developers can include a CodeMeta generation workflow in their repositories, facilitating the adoption of CodeMeta and, ultimately, increasing research software FAIRness.
academic
Generating CodeMeta using declarative mapping rules: An open-ended approach using ShExML
Today, software serves as a cornerstone for research in multiple scientific domains that employ computational methods to address novel research questions. However, to ensure complete reproducibility of these experiments, research software should comply with the FAIR principles, yet its metadata may follow different data models and be dispersed across various locations. To bring some coherence to this field, CodeMeta has been proposed as a vocabulary for representing research software metadata in a unified and standardized manner. While existing tools can assist users in generating CodeMeta files for certain specific use cases, they fall short in terms of flexibility and adaptability. Therefore, this paper proposes using declarative mapping rules to generate CodeMeta files, illustrated through the implementation of three cross-platform mappings in ShExML, which are subsequently extended and merged to cover CodeMeta file generation for two existing research software artifacts. Furthermore, using SHACL and ShEx for output validation, the entire generation workflow is automated, requiring minimal user intervention upon new version releases.
FAIR Compliance Issues for Research Software: Although research software is crucial support for scientific research, its metadata is scattered across different platforms (GitHub, Zenodo, Maven, etc.), using different data models, lacking uniformity.
Limitations of Existing Tools:
Most tools support only one-to-one conversion (single metadata source to CodeMeta)
Lack of flexibility and adaptability
Require manual user intervention for data reconciliation
Insufficient automation capabilities
Barriers to CodeMeta Adoption: Although CodeMeta provides a unified representation standard for research software metadata, limitations of existing tools hinder its widespread adoption.
Intelligent Source Selection: When multiple sources contain the same attribute, selecting the optimal source based on semantic relevance and maintenance convenience
Hardcoded Value Supplementation: For data that cannot be obtained from external sources, allowing direct definition in the mapping file
Data Transformation Functions: Handling data cleaning tasks such as date format conversion and URL normalization
The paper includes 37 references covering important works in FAIR principles, semantic web technologies, CodeMeta specifications, declarative mapping languages, and related fields, providing solid theoretical foundation and technical support for the research.
Overall Assessment: This is a technically practical paper in the research software metadata management domain with innovative declarative mapping methods, complete and reproducible implementation, and positive significance for promoting CodeMeta standard adoption. While there is room for improvement in evaluation scope and depth, it provides valuable technical contributions to the field.