2025-11-12T20:43:14.525720

Multi Class Parkinsons Disease Detection Based on Finger Tapping Using Attention-Enhanced CNN BiLSTM

Miah, Hassan, Hossain et al.

Effective clinical management and intervention development depend on accurate evaluation of Parkinsons disease (PD) severity. Many researchers have worked on developing gesture-based PD recognition systems; however, their performance accuracy is not satisfactory. In this study, we propose a multi-class Parkinson Disease detection system based on finger tapping using an attention-enhanced CNN BiLSTM. We collected finger tapping videos and derived temporal, frequency, and amplitude based features from wrist and hand movements. Then, we proposed a hybrid deep learning framework integrating CNN, BiLSTM, and attention mechanisms for multi-class PD severity classification from video-derived motion features. First, the input sequence is reshaped and passed through a Conv1D MaxPooling block to capture local spatial dependencies. The resulting feature maps are fed into a BiLSTM layer to model temporal dynamics. An attention mechanism focuses on the most informative temporal features, producing a context vector that is further processed by a second BiLSTM layer. CNN-derived features and attention-enhanced BiLSTM outputs are concatenated, followed by dense and dropout layers, before the final softmax classifier outputs the predicted PD severity level. The model demonstrated strong performance in distinguishing between the five severity classes, suggesting that integrating spatial temporal representations with attention mechanisms can improve automated PD severity detection, making it a promising non-invasive tool to support clinicians in PD monitoring and progression tracking.

academic

Multi-Class Parkinson's Disease Detection Based on Finger Tapping Using Attention-Enhanced CNN-BiLSTM

Basic Information

Paper ID: 2510.10121
Title: Multi-Class Parkinson's Disease Detection Based on Finger Tapping Using Attention-Enhanced CNN-BiLSTM
Authors: Abu Saleh Musa Miah, Md Maruf Al Hossain, Najmul Hassan, Yuichi Okuyama, Jungpil Shin
Category: cs.CV (Computer Vision)
Publication Date: October 11, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.10121

Abstract

Effective clinical management and intervention development for Parkinson's Disease (PD) depend on accurate assessment of disease severity. This study proposes a multi-class PD detection system based on finger tapping, employing an attention-enhanced CNN-BiLSTM architecture. The research extracts temporal, frequency, and amplitude features from finger tapping videos and constructs a hybrid deep learning framework integrating CNN, BiLSTM, and attention mechanisms. The model captures local spatial dependencies through Conv1D-MaxPooling blocks, models temporal dynamics via BiLSTM layers, and focuses on the most informative temporal features through attention mechanisms. The approach achieves 93% classification accuracy and demonstrates excellent performance in distinguishing five severity levels.

Research Background and Motivation

Problem Definition

Parkinson's Disease is a progressive neurodegenerative disorder affecting over 10 million people worldwide, primarily manifesting as tremor, rigidity, bradykinesia, and postural instability. Traditional PD severity assessment relies primarily on clinical scales such as UPDRS (Unified Parkinson's Disease Rating Scale) and MDS-UPDRS.

Limitations of Existing Methods

High Subjectivity: Traditional clinical assessment depends on subjective physician judgment, exhibiting inter-rater variability
Time-Consuming: Clinical assessment procedures are complex and resource-intensive
Poor Consistency: Lack of objective, standardized assessment methods affects disease progression tracking
Insufficient Accuracy: Existing gesture-based PD recognition systems demonstrate suboptimal performance

Research Motivation

To develop a non-invasive, objective, and accessible automated PD severity assessment method based on video analysis, leveraging computer vision and machine learning techniques to achieve precise disease grading and provide reliable auxiliary diagnostic tools for clinicians.

Core Contributions

Proposed an attention-enhanced CNN-BiLSTM hybrid architecture that effectively combines spatial feature extraction and temporal sequence modeling
Implemented multi-class PD severity classification capable of distinguishing five different severity levels
Integrated attention mechanisms to enhance the model's focus on critical temporal features
Achieved 93% classification accuracy, significantly outperforming baseline methods
Provided a non-invasive PD monitoring tool supporting clinicians in disease progression tracking

Methodology

Task Definition

Input: 57-dimensional feature vectors derived from finger tapping videos, containing temporal, frequency, and amplitude features Output: Five-class PD severity classification results (Class 0-4) Constraints: Expert-annotated data based on MDS-UPDRS standards

Model Architecture

Overall Design

The model employs a multi-stage processing pipeline:

Input Reshaping: Reshape 57-dimensional features into sequence format
CNN Feature Extraction: Conv1D + MaxPooling1D capture local spatial patterns
BiLSTM Temporal Modeling: Bidirectional LSTM models temporal dependencies
Attention Mechanism: Focus on the most important temporal features
Feature Fusion: Concatenate CNN and attention-enhanced BiLSTM features
Classification Output: Fully connected layer + Softmax for five-class classification

Mathematical Formulation

Input Representation:

X = {x₁, x₂, ..., xₙ}, xᵢ ∈ R⁵⁷

Convolutional Processing:

X_reshaped = Reshape(X) ∈ R^(N×57×1)
X_conv = Conv1D(X_reshaped)
X_pool = MaxPooling1D(X_conv)

BiLSTM Modeling:

hₜ = BiLSTM(X_pool)

Attention Mechanism:

score(i,j) = tanh(W₁hᵢ + W₂hⱼ)
αᵢⱼ = softmax(V(score(i,j)))
cⱼ = Σᵢ αᵢⱼhᵢ

Feature Fusion and Output:

X_combined = [Flatten(X_conv), Flatten(h_final)]
ŷ = softmax(Dense(X_combined))

Technical Innovations

Multimodal Feature Fusion: Simultaneously leverages spatial features extracted by CNN and temporal features modeled by BiLSTM
Dual-Layer BiLSTM Design: First layer models basic temporal dependencies; second layer processes attention-enhanced features
Adaptive Attention Weights: Dynamically computes attention weights to automatically focus on critical temporal segments
End-to-End Optimization: The entire architecture can be trained end-to-end, avoiding manual feature engineering

Experimental Setup

Dataset

Data Source: ParkTest public dataset
Data Scale: Finger tapping videos from 250 global participants
Data Collection: Primarily collected at participants' homes via webcam; 48 participants completed assessment at clinics
Annotation Method: Annotated by expert neurologists and MDS-UPDRS certified assessors
Feature Dimension: 57-dimensional features including finger tapping speed, acceleration, frequency, periodicity, amplitude, and wrist displacement

Evaluation Metrics

Accuracy: Overall classification accuracy
Precision: Precision of predictions for each class
Recall: Detection rate for each class
F1-Score: Harmonic mean of precision and recall
Macro-Average: Average of metrics across all classes

Comparison Methods

Baseline Method: Original method proposed by Islam et al. 1
Ablation Study: Analysis of contributions from CNN, BiLSTM, and attention mechanism components

Implementation Details

Optimizer: Adam optimizer
Loss Function: Sparse categorical cross-entropy
Training Epochs: 100
Dropout Rate: 0.2
Fully Connected Layer: 250 units
Training Time: 31.82 seconds (100 epochs)

Experimental Results

Main Results

Class	Precision	Recall	F1-Score
0	95.00%	95.00%	95.00%
1	92.00%	92.00%	92.00%
2	90.00%	97.00%	93.00%
3	100.00%	83.00%	91.00%
4	100.00%	100.00%	100.00%
Macro-Average	95.40%	93.40%	94.20%
Overall Accuracy			93.00%

Key Findings

Excellent Overall Performance: 93% accuracy significantly outperforms baseline methods
Severe Case Identification: Class 4 (severe) achieves 100% precision, recall, and F1-score
Balanced Class Performance: Good performance across all severity levels
Efficient Training: Completes 100 epochs in only 31.82 seconds
Confusion Matrix Analysis: High diagonal concentration with minimal misclassification

Model Performance Analysis

Class 2 Performance: Highest recall (97%), precision 90%, indicating strong model sensitivity to this class
Classes 3-4: Accurate identification of severe cases with significant clinical implications
Attention Effectiveness: Successfully captures relevant temporal patterns in gait features
Architecture Advantages: CNN and BiLSTM combination effectively improves discrimination between adjacent severity levels

Traditional Machine Learning Methods

Feature Engineering: SVM, decision trees, random forests with hand-crafted features
Multimodal Fusion: Combining imaging and clinical data to enhance diagnostic performance
Interpretability: EBM and similar methods provide transparent global and local explanations

Deep Learning Advances

CNN Applications: ResNet18 and similar architectures achieve 98.66% accuracy on MRI data
Attention Mechanisms: AttentionLUNet integrating LeNet and U-Net achieves 99.58% accuracy
Temporal Modeling: CNN-LSTM achieves 93.51% accuracy on speech data
3D Attention: Multi-head attention residual networks for motion change recognition

Advantages of This Work

Compared to existing work, this paper is the first to comprehensively integrate CNN, BiLSTM, and attention mechanisms for multi-class PD severity classification, achieving superior performance on video-derived motor features.

Conclusions and Discussion

Main Conclusions

Method Effectiveness: The attention-enhanced CNN-BiLSTM architecture effectively detects multi-class PD severity
Feature Importance: The combination of temporal, frequency, and amplitude features is crucial for PD classification
Clinical Value: Provides an objective, reproducible disease assessment tool
Technical Advantages: Integration of spatial-temporal representation and attention mechanisms significantly improves automated PD severity detection performance

Limitations

Dataset Scale: 250 samples are relatively small and may affect model generalization
Feature Dependency: Relies on pre-extracted hand-crafted features without end-to-end raw video processing
Single Modality: Based solely on finger tapping without fusion of other motor modalities
Cross-Dataset Validation: Lacks validation on other independent datasets

Future Directions

Multimodal Fusion: Integrate multiple modalities including gait, speech, and facial expressions
End-to-End Learning: Learn feature representations directly from raw video
Large-Scale Validation: Validate on larger, multi-center datasets
Real-Time Application: Develop real-time PD monitoring systems
Interpretability Enhancement: Improve model interpretability and clinical credibility

In-Depth Evaluation

Strengths

Architectural Innovation: First comprehensive integration of CNN, BiLSTM, and attention mechanisms for PD classification
Excellent Performance: 93% accuracy represents a high level in this field
Practical Value: Provides a non-invasive, objective PD assessment tool
Technical Completeness: Complete technical pipeline from feature extraction to classification
Clinical Relevance: Based on standard MDS-UPDRS assessment with clinical credibility

Weaknesses

Data Scale Limitation: 250 samples may be insufficient for adequately training deep models
Feature Engineering Dependency: Still relies on manually designed features without end-to-end learning
Single Task Focus: Addresses only finger tapping without considering other PD motor symptoms
Lack of Ablation Studies: Insufficient detailed analysis of individual component contributions
Generalization Verification: Lacks cross-dataset and cross-population validation

Impact

Academic Contribution: Provides a new technical pathway for automated PD detection
Clinical Application: Potential to become an auxiliary diagnostic tool for neurologists
Technology Promotion: Attention-enhanced hybrid architecture applicable to other medical applications
Social Value: Provides convenient self-monitoring means for PD patients

Applicable Scenarios

Clinical Auxiliary Diagnosis: Supports neurologists in PD severity assessment
Home Monitoring: Enables patients to conduct regular self-assessment at home
Drug Efficacy Evaluation: Monitors disease changes during treatment
Large-Scale Screening: Applicable to community or health examination center PD screening
Telemedicine: Supports PD monitoring in remote healthcare settings

References

1 Md Saiful Islam et al. Using ai to measure parkinson's disease severity at home. NPJ digital medicine, 6(1):156, 2023.

27 Daniel Deng et al. Interpretable video-based tracking and quantification of parkinsonism clinical motor states. npj Parkinson's Disease, 10(1):122, 2024.

30 Umesh Kumar Lilhore et al. Hybrid cnn-lstm model with efficient hyperparameter tuning for prediction of parkinson's disease. Scientific Reports, 13(1):14605, 2023.

Overall Assessment: This is a technically solid research paper with clear application value. The authors' proposed attention-enhanced CNN-BiLSTM architecture achieves good results on the multi-class PD detection task, providing valuable technical contributions to the field. Despite limitations in data scale and generalization, the overall research quality is high with promising clinical application prospects.