2025-11-12T20:43:14.525720

Multi Class Parkinsons Disease Detection Based on Finger Tapping Using Attention-Enhanced CNN BiLSTM

Miah, Hassan, Hossain et al.
Effective clinical management and intervention development depend on accurate evaluation of Parkinsons disease (PD) severity. Many researchers have worked on developing gesture-based PD recognition systems; however, their performance accuracy is not satisfactory. In this study, we propose a multi-class Parkinson Disease detection system based on finger tapping using an attention-enhanced CNN BiLSTM. We collected finger tapping videos and derived temporal, frequency, and amplitude based features from wrist and hand movements. Then, we proposed a hybrid deep learning framework integrating CNN, BiLSTM, and attention mechanisms for multi-class PD severity classification from video-derived motion features. First, the input sequence is reshaped and passed through a Conv1D MaxPooling block to capture local spatial dependencies. The resulting feature maps are fed into a BiLSTM layer to model temporal dynamics. An attention mechanism focuses on the most informative temporal features, producing a context vector that is further processed by a second BiLSTM layer. CNN-derived features and attention-enhanced BiLSTM outputs are concatenated, followed by dense and dropout layers, before the final softmax classifier outputs the predicted PD severity level. The model demonstrated strong performance in distinguishing between the five severity classes, suggesting that integrating spatial temporal representations with attention mechanisms can improve automated PD severity detection, making it a promising non-invasive tool to support clinicians in PD monitoring and progression tracking.
academic

Multi-Class Parkinson's Disease Detection Based on Finger Tapping Using Attention-Enhanced CNN-BiLSTM

Basic Information

  • Paper ID: 2510.10121
  • Title: Multi-Class Parkinson's Disease Detection Based on Finger Tapping Using Attention-Enhanced CNN-BiLSTM
  • Authors: Abu Saleh Musa Miah, Md Maruf Al Hossain, Najmul Hassan, Yuichi Okuyama, Jungpil Shin
  • Category: cs.CV (Computer Vision)
  • Publication Date: October 11, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2510.10121

Abstract

Effective clinical management and intervention development for Parkinson's Disease (PD) depend on accurate assessment of disease severity. This study proposes a multi-class PD detection system based on finger tapping, employing an attention-enhanced CNN-BiLSTM architecture. The research extracts temporal, frequency, and amplitude features from finger tapping videos and constructs a hybrid deep learning framework integrating CNN, BiLSTM, and attention mechanisms. The model captures local spatial dependencies through Conv1D-MaxPooling blocks, models temporal dynamics via BiLSTM layers, and focuses on the most informative temporal features through attention mechanisms. The approach achieves 93% classification accuracy and demonstrates excellent performance in distinguishing five severity levels.

Research Background and Motivation

Problem Definition

Parkinson's Disease is a progressive neurodegenerative disorder affecting over 10 million people worldwide, primarily manifesting as tremor, rigidity, bradykinesia, and postural instability. Traditional PD severity assessment relies primarily on clinical scales such as UPDRS (Unified Parkinson's Disease Rating Scale) and MDS-UPDRS.

Limitations of Existing Methods

  1. High Subjectivity: Traditional clinical assessment depends on subjective physician judgment, exhibiting inter-rater variability
  2. Time-Consuming: Clinical assessment procedures are complex and resource-intensive
  3. Poor Consistency: Lack of objective, standardized assessment methods affects disease progression tracking
  4. Insufficient Accuracy: Existing gesture-based PD recognition systems demonstrate suboptimal performance

Research Motivation

To develop a non-invasive, objective, and accessible automated PD severity assessment method based on video analysis, leveraging computer vision and machine learning techniques to achieve precise disease grading and provide reliable auxiliary diagnostic tools for clinicians.

Core Contributions

  1. Proposed an attention-enhanced CNN-BiLSTM hybrid architecture that effectively combines spatial feature extraction and temporal sequence modeling
  2. Implemented multi-class PD severity classification capable of distinguishing five different severity levels
  3. Integrated attention mechanisms to enhance the model's focus on critical temporal features
  4. Achieved 93% classification accuracy, significantly outperforming baseline methods
  5. Provided a non-invasive PD monitoring tool supporting clinicians in disease progression tracking

Methodology

Task Definition

Input: 57-dimensional feature vectors derived from finger tapping videos, containing temporal, frequency, and amplitude features Output: Five-class PD severity classification results (Class 0-4) Constraints: Expert-annotated data based on MDS-UPDRS standards

Model Architecture

Overall Design

The model employs a multi-stage processing pipeline:

  1. Input Reshaping: Reshape 57-dimensional features into sequence format
  2. CNN Feature Extraction: Conv1D + MaxPooling1D capture local spatial patterns
  3. BiLSTM Temporal Modeling: Bidirectional LSTM models temporal dependencies
  4. Attention Mechanism: Focus on the most important temporal features
  5. Feature Fusion: Concatenate CNN and attention-enhanced BiLSTM features
  6. Classification Output: Fully connected layer + Softmax for five-class classification

Mathematical Formulation

Input Representation:

X = {x₁, x₂, ..., xₙ}, xᵢ ∈ R⁵⁷

Convolutional Processing:

X_reshaped = Reshape(X) ∈ R^(N×57×1)
X_conv = Conv1D(X_reshaped)
X_pool = MaxPooling1D(X_conv)

BiLSTM Modeling:

hₜ = BiLSTM(X_pool)

Attention Mechanism:

score(i,j) = tanh(W₁hᵢ + W₂hⱼ)
αᵢⱼ = softmax(V(score(i,j)))
cⱼ = Σᵢ αᵢⱼhᵢ

Feature Fusion and Output:

X_combined = [Flatten(X_conv), Flatten(h_final)]
ŷ = softmax(Dense(X_combined))

Technical Innovations

  1. Multimodal Feature Fusion: Simultaneously leverages spatial features extracted by CNN and temporal features modeled by BiLSTM
  2. Dual-Layer BiLSTM Design: First layer models basic temporal dependencies; second layer processes attention-enhanced features
  3. Adaptive Attention Weights: Dynamically computes attention weights to automatically focus on critical temporal segments
  4. End-to-End Optimization: The entire architecture can be trained end-to-end, avoiding manual feature engineering

Experimental Setup

Dataset

  • Data Source: ParkTest public dataset
  • Data Scale: Finger tapping videos from 250 global participants
  • Data Collection: Primarily collected at participants' homes via webcam; 48 participants completed assessment at clinics
  • Annotation Method: Annotated by expert neurologists and MDS-UPDRS certified assessors
  • Feature Dimension: 57-dimensional features including finger tapping speed, acceleration, frequency, periodicity, amplitude, and wrist displacement

Evaluation Metrics

  • Accuracy: Overall classification accuracy
  • Precision: Precision of predictions for each class
  • Recall: Detection rate for each class
  • F1-Score: Harmonic mean of precision and recall
  • Macro-Average: Average of metrics across all classes

Comparison Methods

  • Baseline Method: Original method proposed by Islam et al. 1
  • Ablation Study: Analysis of contributions from CNN, BiLSTM, and attention mechanism components

Implementation Details

  • Optimizer: Adam optimizer
  • Loss Function: Sparse categorical cross-entropy
  • Training Epochs: 100
  • Dropout Rate: 0.2
  • Fully Connected Layer: 250 units
  • Training Time: 31.82 seconds (100 epochs)

Experimental Results

Main Results

ClassPrecisionRecallF1-Score
095.00%95.00%95.00%
192.00%92.00%92.00%
290.00%97.00%93.00%
3100.00%83.00%91.00%
4100.00%100.00%100.00%
Macro-Average95.40%93.40%94.20%
Overall Accuracy93.00%

Key Findings

  1. Excellent Overall Performance: 93% accuracy significantly outperforms baseline methods
  2. Severe Case Identification: Class 4 (severe) achieves 100% precision, recall, and F1-score
  3. Balanced Class Performance: Good performance across all severity levels
  4. Efficient Training: Completes 100 epochs in only 31.82 seconds
  5. Confusion Matrix Analysis: High diagonal concentration with minimal misclassification

Model Performance Analysis

  • Class 2 Performance: Highest recall (97%), precision 90%, indicating strong model sensitivity to this class
  • Classes 3-4: Accurate identification of severe cases with significant clinical implications
  • Attention Effectiveness: Successfully captures relevant temporal patterns in gait features
  • Architecture Advantages: CNN and BiLSTM combination effectively improves discrimination between adjacent severity levels

Traditional Machine Learning Methods

  • Feature Engineering: SVM, decision trees, random forests with hand-crafted features
  • Multimodal Fusion: Combining imaging and clinical data to enhance diagnostic performance
  • Interpretability: EBM and similar methods provide transparent global and local explanations

Deep Learning Advances

  • CNN Applications: ResNet18 and similar architectures achieve 98.66% accuracy on MRI data
  • Attention Mechanisms: AttentionLUNet integrating LeNet and U-Net achieves 99.58% accuracy
  • Temporal Modeling: CNN-LSTM achieves 93.51% accuracy on speech data
  • 3D Attention: Multi-head attention residual networks for motion change recognition

Advantages of This Work

Compared to existing work, this paper is the first to comprehensively integrate CNN, BiLSTM, and attention mechanisms for multi-class PD severity classification, achieving superior performance on video-derived motor features.

Conclusions and Discussion

Main Conclusions

  1. Method Effectiveness: The attention-enhanced CNN-BiLSTM architecture effectively detects multi-class PD severity
  2. Feature Importance: The combination of temporal, frequency, and amplitude features is crucial for PD classification
  3. Clinical Value: Provides an objective, reproducible disease assessment tool
  4. Technical Advantages: Integration of spatial-temporal representation and attention mechanisms significantly improves automated PD severity detection performance

Limitations

  1. Dataset Scale: 250 samples are relatively small and may affect model generalization
  2. Feature Dependency: Relies on pre-extracted hand-crafted features without end-to-end raw video processing
  3. Single Modality: Based solely on finger tapping without fusion of other motor modalities
  4. Cross-Dataset Validation: Lacks validation on other independent datasets

Future Directions

  1. Multimodal Fusion: Integrate multiple modalities including gait, speech, and facial expressions
  2. End-to-End Learning: Learn feature representations directly from raw video
  3. Large-Scale Validation: Validate on larger, multi-center datasets
  4. Real-Time Application: Develop real-time PD monitoring systems
  5. Interpretability Enhancement: Improve model interpretability and clinical credibility

In-Depth Evaluation

Strengths

  1. Architectural Innovation: First comprehensive integration of CNN, BiLSTM, and attention mechanisms for PD classification
  2. Excellent Performance: 93% accuracy represents a high level in this field
  3. Practical Value: Provides a non-invasive, objective PD assessment tool
  4. Technical Completeness: Complete technical pipeline from feature extraction to classification
  5. Clinical Relevance: Based on standard MDS-UPDRS assessment with clinical credibility

Weaknesses

  1. Data Scale Limitation: 250 samples may be insufficient for adequately training deep models
  2. Feature Engineering Dependency: Still relies on manually designed features without end-to-end learning
  3. Single Task Focus: Addresses only finger tapping without considering other PD motor symptoms
  4. Lack of Ablation Studies: Insufficient detailed analysis of individual component contributions
  5. Generalization Verification: Lacks cross-dataset and cross-population validation

Impact

  1. Academic Contribution: Provides a new technical pathway for automated PD detection
  2. Clinical Application: Potential to become an auxiliary diagnostic tool for neurologists
  3. Technology Promotion: Attention-enhanced hybrid architecture applicable to other medical applications
  4. Social Value: Provides convenient self-monitoring means for PD patients

Applicable Scenarios

  1. Clinical Auxiliary Diagnosis: Supports neurologists in PD severity assessment
  2. Home Monitoring: Enables patients to conduct regular self-assessment at home
  3. Drug Efficacy Evaluation: Monitors disease changes during treatment
  4. Large-Scale Screening: Applicable to community or health examination center PD screening
  5. Telemedicine: Supports PD monitoring in remote healthcare settings

References

1 Md Saiful Islam et al. Using ai to measure parkinson's disease severity at home. NPJ digital medicine, 6(1):156, 2023.

27 Daniel Deng et al. Interpretable video-based tracking and quantification of parkinsonism clinical motor states. npj Parkinson's Disease, 10(1):122, 2024.

30 Umesh Kumar Lilhore et al. Hybrid cnn-lstm model with efficient hyperparameter tuning for prediction of parkinson's disease. Scientific Reports, 13(1):14605, 2023.


Overall Assessment: This is a technically solid research paper with clear application value. The authors' proposed attention-enhanced CNN-BiLSTM architecture achieves good results on the multi-class PD detection task, providing valuable technical contributions to the field. Despite limitations in data scale and generalization, the overall research quality is high with promising clinical application prospects.