2025-11-12T18:16:10.275762

A Novel Approach using CapsNet and Deep Belief Network for Detection and Identification of Oral Leukopenia

GV, M, S
Oral cancer constitutes a significant global health concern, resulting in 277,484 fatalities in 2023, with the highest prevalence observed in low- and middle-income nations. Facilitating automation in the detection of possibly malignant and malignant lesions in the oral cavity could result in cost-effective and early disease diagnosis. Establishing an extensive repository of meticulously annotated oral lesions is essential. In this research photos are being collected from global clinical experts, who have been equipped with an annotation tool to generate comprehensive labelling. This research presents a novel approach for integrating bounding box annotations from various doctors. Additionally, Deep Belief Network combined with CAPSNET is employed to develop automated systems that extracted intricate patterns to address this challenging problem. This study evaluated two deep learning-based computer vision methodologies for the automated detection and classification of oral lesions to facilitate the early detection of oral cancer: image classification utilizing CAPSNET. Image classification attained an F1 score of 94.23% for detecting photos with lesions 93.46% for identifying images necessitating referral. Object detection attained an F1 score of 89.34% for identifying lesions for referral. Subsequent performances are documented about classification based on the sort of referral decision. Our preliminary findings indicate that deep learning possesses the capability to address this complex problem.
academic

A Novel Approach using CapsNet and Deep Belief Network for Detection and Identification of Oral Cancer

Basic Information

  • Paper ID: 2501.00876
  • Title: Enhanced Classification of Oral Cancer Using Deep Learning Techniques
  • Authors: Dr. Senthil Pandi S, Hirthik Mathesh GV, Kavin Chakravarthy M (Rajalakshmi Engineering College, Chennai, India)
  • Classification: eess.IV cs.CV cs.LG
  • Research Domain: Medical Image Processing, Deep Learning, Computer Vision
  • Paper Link: https://arxiv.org/abs/2501.00876

Abstract

Oral cancer represents a significant global health problem, resulting in 277,484 deaths in 2023, with the highest incidence in low- and middle-income countries. This study proposes a novel approach combining CapsNet and Deep Belief Network (DBN) for automated detection and classification of oral lesions. The research collected image data from global clinical experts and equipped annotation tools for comprehensive labeling. The method achieved an F1 score of 94.23% for lesion detection in image classification tasks, 93.46% F1 score for identifying images requiring referral, and 89.34% F1 score in object detection tasks.

Research Background and Motivation

Problem Significance

  1. Global Health Burden: Oral cancer is a major health problem worldwide, with GLOBOCAN 2021 predicting 387,864 new cases and 234,384 deaths
  2. Geographic Disparities: Three-quarters of cases occur in low-income countries, with Africa and India accounting for half of global cases
  3. Delayed Diagnosis: In low- and middle-income countries (LMICs), more than two-thirds of cases are discovered at advanced stages, resulting in lower survival rates
  4. Economic Burden: Cancer treatment costs are extremely high, particularly in cases of late-stage diagnosis

Limitations of Existing Methods

  1. Professional Shortage: Lack of specialists and medical resources, particularly in LMIC regions
  2. Diagnostic Subjectivity: Traditional diagnosis relies on clinician experience, lacking standardized approaches
  3. Equipment Requirements: Existing deep learning methods require expensive equipment or specially designed screening platforms
  4. Accessibility Issues: Requirements for high-magnification examination of regions of interest (ROI) limit widespread application

Research Motivation

  1. Develop cost-effective automated early diagnosis systems
  2. Utilize mobile device images for telemedicine screening
  3. Improve referral accuracy in screening programs
  4. Reduce dependence on specialized equipment and personnel

Core Contributions

  1. Innovative Architecture: Proposes a hybrid deep learning framework combining CapsNet and Deep Belief Network (DBN)
  2. Multi-Physician Annotation Fusion: Develops a novel method integrating bounding box annotations from multiple physicians
  3. High-Performance Detection: Achieves excellent performance in oral lesion detection and classification tasks
  4. Practical Design: Designed for real-world applications using mobile device images

Methodology Details

Task Definition

  • Input: Oral cavity images (from mobile devices or clinical equipment)
  • Output: Lesion detection results, classification labels, referral recommendations
  • Objective: Automatically identify oral lesions and classify malignancy severity

Model Architecture

1. Hybrid Architecture Design

The proposed hybrid model combines two core components:

  • CapsNet: For image classification tasks
  • Deep Belief Network (DBN): For feature extraction and pattern recognition

2. CapsNet Component

Core Concept: Simulates "capsule" processing units in the human brain

  • Capsule Structure: Each capsule represents a specific entity in the image, with neuron states encoding entity features
  • Vector Output: Output vector length indicates entity presence probability, direction reflects entity attributes
  • Dynamic Routing: Replaces traditional max pooling with "agreement routing" mechanism
  • Squashing Function: Applies nonlinear transformation to vector outputs, ensuring appropriate scale representation

Technical Advantages:

Traditional CNN: Layer-by-layer stacking → Feature loss
CapsNet: Hierarchical nesting → Preserves spatial relationships

3. Deep Belief Network (DBN)

Preprocessing Pipeline:

  1. Image Whitening: Reduces correlation between adjacent pixels, standardizes variance to zero
  2. Mini-batch Processing: Randomly partitions input data, reduces noise effects

Network Structure:

  • Three-layer DBN Architecture: For feature extraction from neuroblastoma histological images
  • Stacked CRBM: Vertically stacked Convolutional Restricted Boltzmann Machines
  • Hierarchical Structure: Visible layer (RK×RK) → Hidden layers (N groups of MQ×MQ units) → Pooling layer

Key Parameters:

  • Total number of neurons
  • Number of hidden layer groups
  • Mini-batch size

Technical Innovations

  1. CapsNet Application: First application of CapsNet to oral cancer detection, preserving spatial hierarchical information
  2. Hybrid Architecture: Effective combination of DBN and CapsNet, leveraging respective strengths
  3. Multi-Physician Annotation: Innovative bounding box annotation fusion strategy
  4. End-to-End Learning: Complete pipeline from raw images to final diagnostic recommendations

Experimental Setup

Dataset

  • Data Source: Oral images collected by global clinical experts
  • Annotation Method: Multi-physician bounding box annotation
  • Data Augmentation: Applied rotation, flipping, and other techniques to expand training set
  • Preprocessing:
    • Color normalization to eliminate staining variations
    • Median filtering for noise reduction
    • Image enhancement to reduce overfitting

Evaluation Metrics

  • F1 Score: Harmonic mean of precision and recall
  • Precision: Proportion of correctly predicted positive cases among all predicted positive cases
  • Recall: Proportion of correctly predicted positive cases among all actual positive cases
  • Accuracy: Overall proportion of correct predictions

Training Strategy

  • Training Epochs: Initial 10 epochs, extended to 30 epochs
  • Early Stopping: Halted at epoch 12 after achieving optimal validation accuracy of 97.1%
  • Loss Function: Both training and validation losses show decreasing trends and stabilize

Experimental Results

Main Results

Overall Performance Metrics

  • Image Classification:
    • Lesion Detection: F1 score 94.23%
    • Referral Identification: F1 score 93.46%
  • Object Detection:
    • Referral Lesion Identification: F1 score 89.34%

Detailed Classification Results

Image CategoryPrecision (%)Recall (%)F1 Score (%)
No Lesion Detected90.8691.2380.65
No Referral Required93.2690.2194.52
Other Visit Reasons89.3291.2480.15
Low Cancer Risk90.8889.2387.21
High Cancer Risk94.2490.2184.21

Training Process Analysis

  • Accuracy Progression: Exponential growth in first 12 epochs, then plateaus
  • Final Training Accuracy: 94.28%
  • Final Validation Accuracy: 94.55%
  • Loss Values: Training loss 0.18432, validation loss 0.16543

Experimental Findings

  1. Convergence Characteristics: Model converges effectively within 30 epochs
  2. Generalization Ability: Consistent trends between training and validation curves, indicating good generalization
  3. Training Stability: Smooth loss function decrease, stable model training
  4. Performance Stratification: Detection performance varies across different risk levels

Evolution of Traditional Methods

  1. Texture Features: Early research focused on grayscale and texture features
  2. Advanced Techniques: Subsequent introduction of high-order imaging techniques and texture energy laws
  3. Deep Learning: CNNs widely applied to medical imaging following ImageNet competition success

Existing Deep Learning Methods

  1. Multimodal Approaches: Multimodal deep learning frameworks incorporating patient metadata (87% accuracy)
  2. Ada Boosting: Methods utilizing five color spaces (97.25% accuracy)
  3. Ensemble Learning: Pretrained CNN ensemble models (97.88% accuracy)
  4. Transfer Learning: Applications of pretrained models such as ResNet50

Advantages of This Work

  1. Low Equipment Requirements: Applicable to mobile device images, no specialized equipment needed
  2. Architectural Innovation: Unique combination of CapsNet and DBN
  3. Strong Practicality: Designed for real-world clinical application scenarios

Conclusions and Discussion

Main Conclusions

  1. Technical Feasibility: Deep learning demonstrates capability to address the complex problem of oral cancer detection
  2. Excellent Performance: Achieves performance exceeding 90% on multiple evaluation metrics
  3. Clinical Value: Supports early diagnosis and referral decision-making

Limitations

  1. Dataset Scale: Specific dataset size not clearly specified
  2. Cross-Racial Validation: Lacks validation results across different populations
  3. Real-Time Performance: Inference time and computational complexity not reported
  4. Title Inconsistency: Paper title mentions "Oral Leukopenia" but content primarily focuses on oral cancer

Future Directions

  1. Multimodal Fusion: Integration of additional clinical data types
  2. Population Expansion: Model validation across broader populations
  3. Real-Time Deployment: Model optimization for mobile device real-time inference
  4. Standardization: Establishment of unified evaluation standards and datasets

In-Depth Evaluation

Strengths

  1. Methodological Innovation: Novel combination of CapsNet and DBN
  2. Practical Relevance: Addresses important global health problems
  3. Excellent Performance: Achieves high levels across multiple metrics
  4. Practical Design: Considers feasibility of real-world deployment

Weaknesses

  1. Theoretical Analysis: Lacks in-depth theoretical analysis of hybrid architecture
  2. Comparative Experiments: Insufficient comparison with other state-of-the-art methods
  3. Ablation Studies: Insufficient validation of individual component contributions
  4. Generalization Verification: Lacks cross-dataset validation results

Impact

  1. Academic Value: Provides new technical pathways for medical image analysis
  2. Practical Value: Promising application in resource-limited regions for screening
  3. Reproducibility: Requires more detailed implementation details to support reproduction

Applicable Scenarios

  1. Telemedicine: Applicable to regions lacking specialist physicians
  2. Preliminary Screening: Can serve as auxiliary tool for clinical examination
  3. Educational Training: Useful for medical student and general practitioner training
  4. Large-Scale Screening: Supports population-level oral cancer screening programs

References

The paper cites 15 relevant studies covering oral cancer detection, deep learning applications, multimodal methods, and other aspects, providing solid theoretical foundation and technical comparison for this research.


Overall Assessment: This study proposes an innovative hybrid deep learning framework for oral cancer detection with significant clinical application value. While there is room for improvement in theoretical analysis and experimental validation, its design approach addressing practical needs and excellent performance make it a valuable contribution to the field.