2025-11-20T11:34:15.055386

Meronymic Ontology Extraction via Large Language Models

Zhang, Conia, Rago

Ontologies have become essential in today's digital age as a way of organising the vast amount of readily available unstructured text. In providing formal structure to this information, ontologies have immense value and application across various domains, e.g., e-commerce, where countless product listings necessitate proper product organisation. However, the manual construction of these ontologies is a time-consuming, expensive and laborious process. In this paper, we harness the recent advancements in large language models (LLMs) to develop a fully-automated method of extracting product ontologies, in the form of meronymies, from raw review texts. We demonstrate that the ontologies produced by our method surpass an existing, BERT-based baseline when evaluating using an LLM-as-a-judge. Our investigation provides the groundwork for LLMs to be used more generally in (product or otherwise) ontology extraction.

academic

Meronymic Ontology Extraction via Large Language Models

Basic Information

Paper ID: 2510.13839
Title: Meronymic Ontology Extraction via Large Language Models
Authors: Dekai Zhang (Imperial College London), Simone Conia (Sapienza University of Rome), Antonio Rago (Imperial College London & King's College London)
Classification: cs.CL cs.AI
Publication Date: October 11, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.13839

Abstract

This paper leverages recent advances in Large Language Models (LLMs) to develop a fully automated method for extracting product ontologies (in the form of part-whole relationships) from raw review texts. The study demonstrates that the ontologies generated by this method surpass existing BERT-based baseline approaches in evaluations using LLMs as judges. This research establishes a foundation for broader applications of LLMs in ontology extraction tasks.

Research Background and Motivation

Problem Definition

In the digital age, massive volumes of unstructured textual data require organization and structuring through ontologies. Particularly in e-commerce, countless product listings require appropriate product organizational structures. Part-whole relationships (meronymic relations) hold significant value in downstream tasks such as review aggregation, sentiment analysis, and product question-answering.

Limitations of Existing Approaches

High Manual Construction Costs: Manual ontology construction is a time-consuming, expensive, and labor-intensive process
Insufficient Automation Methods: Previous research has primarily focused on extracting taxonomic relations rather than part-whole relationships
Evaluation Difficulties: Lack of standard benchmark datasets makes it difficult to effectively evaluate the quality of part-whole ontologies
Dependence on Manual Annotation: Existing methods such as the BERT approach by Oksanen et al. (2021) still require a certain degree of manual annotation

Research Motivation

This paper aims to leverage the powerful capabilities of LLMs to develop a fully automated method for part-whole ontology extraction and propose a novel evaluation framework to validate the method's effectiveness.

Core Contributions

Proposes Fully Automated LLM Method: Develops a completely automated method using LLMs for part-whole ontology extraction that generalizes across different product categories
Innovative Evaluation Framework: Introduces a novel approach using LLM-as-a-judge for empirical evaluation of various tasks in part-whole ontology extraction
Performance Improvement Verification: Demonstrates through experiments that the LLM method significantly outperforms BERT-based baseline methods in relevance
Open-Source Code: Provides complete implementation code to promote research reproducibility

Methodology Details

Task Definition

Input: Product review texts Output: Part-whole ontology graph containing concept nodes and "part-whole" relationships between them Constraints: Relationships must be meaningful part-whole relations, and concepts must be product-relevant

Model Architecture

The proposed method comprises a four-stage pipeline:

1. Aspect Extraction

Method: Fine-tuning Mistral-7B-Instruct-v0.2
Training Data: SemEval-2014 Task 4 dataset (1,600 samples)
Post-processing: POS tagging filtering to retain only nouns actually appearing in reviews
Output Control: Selection of top 50 most frequent aspects

2. Synset Extraction

Embedding Model: Fine-tuned FastText model (handles spelling errors and abbreviations)
Clustering Algorithm: Equidistant Node Clustering (ENC) based on cosine similarity
Advantage: Produces more precise clustering results compared to K-means

3. Concept Extraction

Representative Selection: Selects the most frequently occurring term in each synset as representative
Relevance Judgment: Uses LLM prompting to determine whether terms should be included in the ontology
Filtering Criteria: Relevance, specificity, and hierarchical properties

4. Relation Extraction

Input Processing: Extracts sentences containing two aspects from different synsets
Task Design: Multiple-choice questions (aspect A is part of aspect B / aspect B is part of aspect A / unrelated)
Model Training: Fine-tunes Mistral model through distillation on 1,000 synthetic samples

Technical Innovations

End-to-End LLM Pipeline: Achieves higher automation compared to BERT methods
Structured Output Constraints: Uses JSON syntax constraints to ensure consistent output formatting
Multi-Stage Optimization: Each stage is optimized for specific tasks to improve overall performance
Hallucination Mitigation: Reduces LLM hallucination issues through POS tagging filtering and fine-tuning

Experimental Setup

Datasets

Source: Amazon Reviews 2023 dataset
Product Categories: 5 categories (video games, televisions, necklaces/watches, stand mixers)
Data Scale: 100,000 reviews per product (26,464 for mixers)
Processing Limitation: 1,000 reviews used for LLM tasks (considering processing time)

Evaluation Metrics

Term Evaluation Criteria:

Relevance: Whether the term accurately represents a product part or component
Specificity: Whether the term has an appropriate level of specificity
Clarity: Whether the term clearly conveys intent and avoids ambiguity
Product Match: Whether the term logically fits the given product

Relation Evaluation Criteria:

Logical Hierarchy: Whether child nodes logically represent parts or features of parent nodes
Contextual Match: Whether relationships are reasonable within Amazon product categories
Clarity and Specificity: Whether relationships avoid ambiguity and clearly define part-whole relations

Baseline Methods

Baseline Method: BERT-based method by Oksanen et al. (2021)
Evaluation Method: Gemini 1.5 Flash as LLM judge
Comparison Versions: Full version and shortened version (equal term count to baseline)

Implementation Details

Hardware: NVIDIA GeForce RTX 4090 GPU
Optimizer: Adam (learning rate 10^-4)
Fine-tuning Technique: LoRA (r=4, α=16)
Training Epochs: 3, batch size 16

Experimental Results

Main Results

Term Quality Evaluation

Product Category	Proposed Method (Full)	Proposed Method (Shortened)	BERT Baseline
Video Games	4.00	4.18	3.92
Television	4.06	4.05	3.95
Necklace	4.50	4.57	3.86
Watch	4.13	4.37	4.10
Stand Mixer	4.36	4.40	3.31

Relation Quality Evaluation

Product Category	Proposed Method (Full)	Proposed Method (Shortened)	BERT Baseline
Video Games	3.89	3.82	3.43
Television	3.99	4.56	3.21
Necklace	3.65	3.79	3.29
Watch	3.75	4.06	2.68
Stand Mixer	3.30	3.40	2.47

Ablation Studies

Aspect Extraction Method Comparison

Method	Average Score
Method A1 (Prompt Only)	1.960 ± 0.006
Method A2 (Prompt + Sentiment)	2.259 ± 0.002
Method A3 (Fine-tuning)	2.662 ± 0.006

Relation Extraction Method Comparison

Method	Video Games	Television	Necklace	Watch	Mixer
Full Reviews	3.811	4.155	3.397	3.570	3.080
Excerpts	3.727	3.726	3.481	3.398	2.493
Excerpts + Fine-tuning	3.893	3.987	3.646	3.747	3.303

Efficiency Analysis

Proposed Method Processing Time

Stage	Average Time (minutes)
Aspect Extraction	32.05
Synset Extraction	0.78
Concept Extraction	1.52
Relation Extraction	4.53
Total	38.89

BERT Baseline Processing Time

Stage	Average Time (minutes)
Entity Extraction	1.66
Aspect Extraction	2.79
Synset Extraction	0.82
Ontology Extraction	1.36
Total	6.62

Experimental Findings

Quality Improvement: LLM method significantly outperforms BERT baseline in both term and relation quality
Fine-tuning Importance: Fine-tuning brings significant performance improvements compared to pure prompting methods
Computational Cost: LLM method achieves higher quality but at approximately 6 times the computational cost of BERT
Clustering Algorithm Selection: ENC produces more precise synsets compared to K-means

Ontology Learning

Traditional ontology learning primarily relies on deep learning methods, but most focus on extracting taxonomic relations rather than part-whole relationships.

LLM Applications in Ontology Construction

Recent research has begun exploring the application of LLMs in key ontology learning tasks such as term and relation extraction, but primarily focuses on taxonomic relations.

Evaluation Methods

Due to the lack of standard benchmarks, ontology quality evaluation has been a persistent challenge. The LLM-as-a-judge method proposed in this paper provides a novel solution to this problem.

Conclusions and Discussion

Main Conclusions

LLM method significantly outperforms existing BERT methods in part-whole ontology extraction tasks
Fine-tuning and structured output constraints are key factors for performance improvement
LLM-as-a-judge provides a viable solution for ontology quality assessment

Limitations

Evaluation Dependency: Primarily relies on LLM-as-a-judge, lacking user study validation
Computational Cost: Significantly increased computational cost compared to BERT methods
Hallucination Issues: LLMs still exhibit hallucination problems in generating irrelevant aspects
Benchmark Absence: Lack of standard benchmark datasets in the product ontology domain

Future Directions

Standard Benchmark Construction: Establish standard benchmark datasets for this task
User Study Validation: Verify the practical utility of ontologies through user studies
Method Generalization: Explore application of the method to other ontology types (e.g., taxonomic ontologies)
Hallucination Mitigation: Research methods integrating multiple LLMs to reduce single-model hallucinations

In-Depth Evaluation

Strengths

Strong Innovation: First systematic application of LLMs to part-whole ontology extraction
Complete Methodology: Provides an end-to-end complete pipeline solution
Evaluation Innovation: Proposes the LLM-as-a-judge evaluation framework
Comprehensive Experiments: Includes detailed ablation studies and efficiency analysis
Open-Source Contribution: Provides complete open-source implementation

Weaknesses

Evaluation Limitations: Over-reliance on LLM evaluation, lacking human evaluation validation
Cost Considerations: Significantly increased computational cost but insufficient discussion of cost-benefit tradeoffs
Generalization: Validation on only 5 product categories; generalization requires further verification
Baseline Comparison: Insufficient comparison with more existing methods

Impact

Academic Value: Provides important reference for LLM applications in ontology construction
Practical Value: Direct application potential in e-commerce and related domains
Methodological Contribution: LLM-as-a-judge evaluation framework has broad applicability
Reproducibility: Provides detailed implementation details and open-source code

Applicable Scenarios

E-commerce Platforms: Product categorization and recommendation systems
Knowledge Graph Construction: Automated ontology construction
Information Extraction: Extracting structured relationships from unstructured text
Review Analysis: Product feature and component identification

References

This paper cites important works in related fields, including:

Oksanen et al. (2021): BERT-based product ontology extraction method
Devlin et al. (2019): BERT model
Jiang et al. (2023): Mistral model
Pontiki et al. (2014): SemEval-2014 Task 4 dataset

Overall Assessment: This is an important contribution paper in the field of part-whole ontology extraction. The method demonstrates strong innovation, reasonable experimental design, and convincing results. While there are some limitations in evaluation methodology and computational cost, the paper provides valuable insights and tools for the development of this field.