2025-11-20T11:34:15.055386

Meronymic Ontology Extraction via Large Language Models

Zhang, Conia, Rago
Ontologies have become essential in today's digital age as a way of organising the vast amount of readily available unstructured text. In providing formal structure to this information, ontologies have immense value and application across various domains, e.g., e-commerce, where countless product listings necessitate proper product organisation. However, the manual construction of these ontologies is a time-consuming, expensive and laborious process. In this paper, we harness the recent advancements in large language models (LLMs) to develop a fully-automated method of extracting product ontologies, in the form of meronymies, from raw review texts. We demonstrate that the ontologies produced by our method surpass an existing, BERT-based baseline when evaluating using an LLM-as-a-judge. Our investigation provides the groundwork for LLMs to be used more generally in (product or otherwise) ontology extraction.
academic

Meronymic Ontology Extraction via Large Language Models

Basic Information

  • Paper ID: 2510.13839
  • Title: Meronymic Ontology Extraction via Large Language Models
  • Authors: Dekai Zhang (Imperial College London), Simone Conia (Sapienza University of Rome), Antonio Rago (Imperial College London & King's College London)
  • Classification: cs.CL cs.AI
  • Publication Date: October 11, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2510.13839

Abstract

This paper leverages recent advances in Large Language Models (LLMs) to develop a fully automated method for extracting product ontologies (in the form of part-whole relationships) from raw review texts. The study demonstrates that the ontologies generated by this method surpass existing BERT-based baseline approaches in evaluations using LLMs as judges. This research establishes a foundation for broader applications of LLMs in ontology extraction tasks.

Research Background and Motivation

Problem Definition

In the digital age, massive volumes of unstructured textual data require organization and structuring through ontologies. Particularly in e-commerce, countless product listings require appropriate product organizational structures. Part-whole relationships (meronymic relations) hold significant value in downstream tasks such as review aggregation, sentiment analysis, and product question-answering.

Limitations of Existing Approaches

  1. High Manual Construction Costs: Manual ontology construction is a time-consuming, expensive, and labor-intensive process
  2. Insufficient Automation Methods: Previous research has primarily focused on extracting taxonomic relations rather than part-whole relationships
  3. Evaluation Difficulties: Lack of standard benchmark datasets makes it difficult to effectively evaluate the quality of part-whole ontologies
  4. Dependence on Manual Annotation: Existing methods such as the BERT approach by Oksanen et al. (2021) still require a certain degree of manual annotation

Research Motivation

This paper aims to leverage the powerful capabilities of LLMs to develop a fully automated method for part-whole ontology extraction and propose a novel evaluation framework to validate the method's effectiveness.

Core Contributions

  1. Proposes Fully Automated LLM Method: Develops a completely automated method using LLMs for part-whole ontology extraction that generalizes across different product categories
  2. Innovative Evaluation Framework: Introduces a novel approach using LLM-as-a-judge for empirical evaluation of various tasks in part-whole ontology extraction
  3. Performance Improvement Verification: Demonstrates through experiments that the LLM method significantly outperforms BERT-based baseline methods in relevance
  4. Open-Source Code: Provides complete implementation code to promote research reproducibility

Methodology Details

Task Definition

Input: Product review texts Output: Part-whole ontology graph containing concept nodes and "part-whole" relationships between them Constraints: Relationships must be meaningful part-whole relations, and concepts must be product-relevant

Model Architecture

The proposed method comprises a four-stage pipeline:

1. Aspect Extraction

  • Method: Fine-tuning Mistral-7B-Instruct-v0.2
  • Training Data: SemEval-2014 Task 4 dataset (1,600 samples)
  • Post-processing: POS tagging filtering to retain only nouns actually appearing in reviews
  • Output Control: Selection of top 50 most frequent aspects

2. Synset Extraction

  • Embedding Model: Fine-tuned FastText model (handles spelling errors and abbreviations)
  • Clustering Algorithm: Equidistant Node Clustering (ENC) based on cosine similarity
  • Advantage: Produces more precise clustering results compared to K-means

3. Concept Extraction

  • Representative Selection: Selects the most frequently occurring term in each synset as representative
  • Relevance Judgment: Uses LLM prompting to determine whether terms should be included in the ontology
  • Filtering Criteria: Relevance, specificity, and hierarchical properties

4. Relation Extraction

  • Input Processing: Extracts sentences containing two aspects from different synsets
  • Task Design: Multiple-choice questions (aspect A is part of aspect B / aspect B is part of aspect A / unrelated)
  • Model Training: Fine-tunes Mistral model through distillation on 1,000 synthetic samples

Technical Innovations

  1. End-to-End LLM Pipeline: Achieves higher automation compared to BERT methods
  2. Structured Output Constraints: Uses JSON syntax constraints to ensure consistent output formatting
  3. Multi-Stage Optimization: Each stage is optimized for specific tasks to improve overall performance
  4. Hallucination Mitigation: Reduces LLM hallucination issues through POS tagging filtering and fine-tuning

Experimental Setup

Datasets

  • Source: Amazon Reviews 2023 dataset
  • Product Categories: 5 categories (video games, televisions, necklaces/watches, stand mixers)
  • Data Scale: 100,000 reviews per product (26,464 for mixers)
  • Processing Limitation: 1,000 reviews used for LLM tasks (considering processing time)

Evaluation Metrics

Term Evaluation Criteria:

  1. Relevance: Whether the term accurately represents a product part or component
  2. Specificity: Whether the term has an appropriate level of specificity
  3. Clarity: Whether the term clearly conveys intent and avoids ambiguity
  4. Product Match: Whether the term logically fits the given product

Relation Evaluation Criteria:

  1. Logical Hierarchy: Whether child nodes logically represent parts or features of parent nodes
  2. Contextual Match: Whether relationships are reasonable within Amazon product categories
  3. Clarity and Specificity: Whether relationships avoid ambiguity and clearly define part-whole relations

Baseline Methods

  • Baseline Method: BERT-based method by Oksanen et al. (2021)
  • Evaluation Method: Gemini 1.5 Flash as LLM judge
  • Comparison Versions: Full version and shortened version (equal term count to baseline)

Implementation Details

  • Hardware: NVIDIA GeForce RTX 4090 GPU
  • Optimizer: Adam (learning rate 10^-4)
  • Fine-tuning Technique: LoRA (r=4, α=16)
  • Training Epochs: 3, batch size 16

Experimental Results

Main Results

Term Quality Evaluation

Product CategoryProposed Method (Full)Proposed Method (Shortened)BERT Baseline
Video Games4.004.183.92
Television4.064.053.95
Necklace4.504.573.86
Watch4.134.374.10
Stand Mixer4.364.403.31

Relation Quality Evaluation

Product CategoryProposed Method (Full)Proposed Method (Shortened)BERT Baseline
Video Games3.893.823.43
Television3.994.563.21
Necklace3.653.793.29
Watch3.754.062.68
Stand Mixer3.303.402.47

Ablation Studies

Aspect Extraction Method Comparison

MethodAverage Score
Method A1 (Prompt Only)1.960 ± 0.006
Method A2 (Prompt + Sentiment)2.259 ± 0.002
Method A3 (Fine-tuning)2.662 ± 0.006

Relation Extraction Method Comparison

MethodVideo GamesTelevisionNecklaceWatchMixer
Full Reviews3.8114.1553.3973.5703.080
Excerpts3.7273.7263.4813.3982.493
Excerpts + Fine-tuning3.8933.9873.6463.7473.303

Efficiency Analysis

Proposed Method Processing Time

StageAverage Time (minutes)
Aspect Extraction32.05
Synset Extraction0.78
Concept Extraction1.52
Relation Extraction4.53
Total38.89

BERT Baseline Processing Time

StageAverage Time (minutes)
Entity Extraction1.66
Aspect Extraction2.79
Synset Extraction0.82
Ontology Extraction1.36
Total6.62

Experimental Findings

  1. Quality Improvement: LLM method significantly outperforms BERT baseline in both term and relation quality
  2. Fine-tuning Importance: Fine-tuning brings significant performance improvements compared to pure prompting methods
  3. Computational Cost: LLM method achieves higher quality but at approximately 6 times the computational cost of BERT
  4. Clustering Algorithm Selection: ENC produces more precise synsets compared to K-means

Ontology Learning

Traditional ontology learning primarily relies on deep learning methods, but most focus on extracting taxonomic relations rather than part-whole relationships.

LLM Applications in Ontology Construction

Recent research has begun exploring the application of LLMs in key ontology learning tasks such as term and relation extraction, but primarily focuses on taxonomic relations.

Evaluation Methods

Due to the lack of standard benchmarks, ontology quality evaluation has been a persistent challenge. The LLM-as-a-judge method proposed in this paper provides a novel solution to this problem.

Conclusions and Discussion

Main Conclusions

  1. LLM method significantly outperforms existing BERT methods in part-whole ontology extraction tasks
  2. Fine-tuning and structured output constraints are key factors for performance improvement
  3. LLM-as-a-judge provides a viable solution for ontology quality assessment

Limitations

  1. Evaluation Dependency: Primarily relies on LLM-as-a-judge, lacking user study validation
  2. Computational Cost: Significantly increased computational cost compared to BERT methods
  3. Hallucination Issues: LLMs still exhibit hallucination problems in generating irrelevant aspects
  4. Benchmark Absence: Lack of standard benchmark datasets in the product ontology domain

Future Directions

  1. Standard Benchmark Construction: Establish standard benchmark datasets for this task
  2. User Study Validation: Verify the practical utility of ontologies through user studies
  3. Method Generalization: Explore application of the method to other ontology types (e.g., taxonomic ontologies)
  4. Hallucination Mitigation: Research methods integrating multiple LLMs to reduce single-model hallucinations

In-Depth Evaluation

Strengths

  1. Strong Innovation: First systematic application of LLMs to part-whole ontology extraction
  2. Complete Methodology: Provides an end-to-end complete pipeline solution
  3. Evaluation Innovation: Proposes the LLM-as-a-judge evaluation framework
  4. Comprehensive Experiments: Includes detailed ablation studies and efficiency analysis
  5. Open-Source Contribution: Provides complete open-source implementation

Weaknesses

  1. Evaluation Limitations: Over-reliance on LLM evaluation, lacking human evaluation validation
  2. Cost Considerations: Significantly increased computational cost but insufficient discussion of cost-benefit tradeoffs
  3. Generalization: Validation on only 5 product categories; generalization requires further verification
  4. Baseline Comparison: Insufficient comparison with more existing methods

Impact

  1. Academic Value: Provides important reference for LLM applications in ontology construction
  2. Practical Value: Direct application potential in e-commerce and related domains
  3. Methodological Contribution: LLM-as-a-judge evaluation framework has broad applicability
  4. Reproducibility: Provides detailed implementation details and open-source code

Applicable Scenarios

  1. E-commerce Platforms: Product categorization and recommendation systems
  2. Knowledge Graph Construction: Automated ontology construction
  3. Information Extraction: Extracting structured relationships from unstructured text
  4. Review Analysis: Product feature and component identification

References

This paper cites important works in related fields, including:

  • Oksanen et al. (2021): BERT-based product ontology extraction method
  • Devlin et al. (2019): BERT model
  • Jiang et al. (2023): Mistral model
  • Pontiki et al. (2014): SemEval-2014 Task 4 dataset

Overall Assessment: This is an important contribution paper in the field of part-whole ontology extraction. The method demonstrates strong innovation, reasonable experimental design, and convincing results. While there are some limitations in evaluation methodology and computational cost, the paper provides valuable insights and tools for the development of this field.