2025-11-10T02:51:59.969530

scellop: A Scalable Redesign of Cell Population Plots for Single-Cell Data

Smits, Akhmetov, Liaw et al.
Summary: Cell population plots are visualizations showing cell population distributions in biological samples with single-cell data, traditionally shown with stacked bar charts. Here, we address issues with this approach, particularly its limited scalability with increasing number of cell types and samples, and present scellop, a novel interactive cell population viewer combining visual encodings optimized for common user tasks in studying populations of cells across samples or conditions. Availability and Implementation: Scellop is available under the MIT licence at https://github.com/hms-dbmi/scellop, and is available on PyPI (https://pypi.org/project/cellpop/) and NPM (https://www.npmjs.com/package/cellpop). A demo is available at https://scellop.netlify.app/.
academic

scellop: A Scalable Redesign of Cell Population Plots for Single-Cell Data

Basic Information

  • Paper ID: 2510.09554
  • Title: scellop: A Scalable Redesign of Cell Population Plots for Single-Cell Data
  • Authors: Thomas C. Smits, Nikolay Akhmetov, Tiffany S. Liaw, Mark S. Keller, Eric Mörth, Nils Gehlenborg
  • Institution: Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
  • Classification: cs.HC (Human-Computer Interaction), q-bio.QM (Quantitative Methods)
  • License: MIT License
  • Paper Link: https://arxiv.org/abs/2510.09554

Abstract

Cell population plots are visualization tools for displaying the distribution of cell populations in single-cell data, traditionally presented using stacked bar charts. This paper addresses the limitations of this approach, particularly scalability constraints as the number of cell types and samples increases. We propose scellop—a novel interactive cell population viewer that combines visual encodings optimized for common user tasks in studying cell populations across samples or conditions.

Research Background and Motivation

Problem Definition

  1. Limitations of Traditional Methods: Cell population plots traditionally use stacked bar charts, which suffer from severe scalability issues
  2. Perceptual Issues: Research by Cleveland & McGill (1984) demonstrates that humans are better at comparing positions than comparing lengths, and offset segments in stacked bar charts are particularly difficult to compare
  3. Modern Challenges: Large-scale single-cell atlas studies can detect more and rarer cell types, making visual comparison increasingly difficult
  4. Color Limitations: Using seven or more colors for categorical encoding reduces readability, with identification accuracy declining as the number of colors increases

Research Significance

  • Data Scale Growth: HuBMAP-annotated RNAseq datasets contain an average of 33 cell types, with some studies containing up to 30 cell types
  • Practical Needs: Support for heterogeneity analysis, cell type comparison, cell count comparison, and other analytical tasks
  • Cross-Domain Applications: Applicable not only to single-cell analysis but also to other fields such as metagenomics

Core Contributions

  1. User Requirements Analysis: Systematic analysis of user tasks and requirements for cell population visualization through user studies with 14 participants
  2. Novel Visualization Design: Proposes an interactive visualization solution based on heatmaps combined with expandable bar charts supporting multi-level analysis
  3. Complete Software Implementation: Develops a cross-platform tool supporting Python (PyPI) and JavaScript (NPM) environments
  4. Practical Deployment: Integrated into the HuBMAP data portal, providing real-world application validation

Methodology Details

Task Definition

Based on user research, three main categories of user tasks were identified:

  1. Single-Sample Structure Viewing: Most common cell types, proportions of specific cell types, comparison of multiple cell type proportions within a sample
  2. Multi-Sample Structure Comparison: Comparison of specific cell type proportions across samples, identification of cell types across samples, contribution percentage of specific cell types to total cells across all samples
  3. Metadata-Associated Comparison: Most common cell types in specific organs, correlation between cell type proportions and sample metadata

Architecture Design

Core Components

  1. Central Heatmap: Uses samples and cell types as rows and columns, encoding cell counts or proportions
  2. Expandable Bar Charts: Each heatmap row can be expanded into detailed bar charts supporting within-sample analysis
  3. Side Panels: Display bar charts and violin plots showing cell counts and distributions
  4. Interactive Controls: Support for normalization, grouping, filtering, and sorting operations

Technical Implementation

  • Frontend: React + visx (D3-based) for visualization
  • State Management: Zustand + zundo middleware supporting undo/redo
  • Python Integration: Jupyter widget based on anywidget
  • Data Support: Compatible with AnnData format, supporting the scverse ecosystem

Design Innovations

  1. Multi-View Integration: Combines heatmap overview and bar chart details, supporting analysis at different granularities
  2. Hierarchical Structure Support: Supports grouping and filtering of cell type hierarchies
  3. Flexible Configuration: Supports multiple normalization, transformation, and color schemes
  4. Backward Compatibility: Configurable as traditional stacked bar chart view

Experimental Setup

User Study

  • Participants: 14 domain experts, including 12 experimental biologists, 5 computational biologists, 5 educators, and 1 clinician
  • Research Method: 30-minute semi-structured interviews
  • Testing Platform: Cell population plots in the HuBMAP data portal

Dataset Validation

  1. HuBMAP Data: 162 datasets with an average of 33 cell types
  2. Human Lung Cell Atlas: 484 datasets with 51 cell types
  3. Kidney RNAseq Dataset: Used for online demonstration

Evaluation Methods

  • Qualitative user feedback analysis
  • Task completion efficiency comparison
  • Visualization accuracy assessment

Experimental Results

User Requirements Discovery

Primary interactive features expected by users (ranked by importance):

  • Normalization options N=10
  • Grouping by cell type hierarchy N=9
  • Overview-to-detail navigation N=9
  • Visualization manipulation capability N=8
  • Additional contextual information N=5

Primary Issues:

  • Color scheme problems N=6
  • Excessive cell type granularity
  • Difficulty identifying missing and ubiquitous cell types

Use Case Analysis

Analysis using Human Lung Cell Atlas data demonstrates:

  1. Disease Difference Discovery: Cystic fibrosis patients show different cell type populations, particularly in immune cells
  2. COVID Impact: Certain COVID patient datasets show different population distributions
  3. Traditional Method Limitations: Stacked bar charts are difficult to compare when handling large numbers of datasets, with missing cell types and small proportions difficult to observe directly

Performance Advantages

Compared to traditional stacked bar charts:

  • Better pattern detection capability (heatmap overview)
  • Higher population comparison accuracy (expandable bar charts)
  • Support for hierarchical structure display
  • Better scalability

Visualization Perception Research

  • Cleveland & McGill (1984): Graphical perception theory
  • Talbot et al. (2014): Bar chart perception experiments
  • Nobre et al. (2024): Accuracy and time studies comparing stacked bar charts with other chart types

Heatmap Tools

  • Bertifier: Flexible encoding heatmap views
  • Clustergrammer: Heatmap visualization for high-dimensional biological data
  • Funkyheatmap: Data frame visualization for mixed data types

Advantages of This Work

Compared to existing heatmap tools, scellop specifically supports:

  • Individual sample structure inspection
  • Multiple normalization and transformation operations
  • Cell type hierarchy manipulation

Conclusions and Discussion

Main Conclusions

  1. scellop successfully addresses scalability issues of traditional stacked bar charts in large-scale single-cell data visualization
  2. Design based on user research effectively supports all identified user tasks
  3. The combination of heatmaps and expandable bar charts provides ideal multi-level analysis capability

Limitations

  1. Currently primarily supports AnnData format with limited data loading options
  2. Lacks network graph representation for hierarchical cell types
  3. Comparison of datasets with different cell type granularities still has room for improvement

Future Directions

  1. Hierarchical Visualization: Integrate network graph representations such as Collapsible Trees for hierarchical cell types
  2. Data Format Extension: Support more alternative file formats
  3. Cross-Domain Applications: Extend to other fields using stacked bar charts such as metagenomics

In-Depth Evaluation

Strengths

  1. User-Centered Design: Design methodology based on systematic user research ensures practical needs-driven approach
  2. Complete Technical Implementation: Provides cross-platform support and integration into actual production environments
  3. Solid Theoretical Foundation: Based on mature visual perception research theory
  4. High Practical Value: Already deployed on important platforms such as HuBMAP

Weaknesses

  1. Evaluation Methods: Lacks quantitative comparative user experience experiments
  2. Scalability Verification: While claiming scalability, lacks performance testing on extremely large-scale data
  3. Learning Cost: New interaction patterns may require user adaptation period

Impact

  1. Disciplinary Contribution: Provides important methodological contributions to single-cell data visualization
  2. Practical Value: Open-source tool already deployed on important research platforms
  3. Reproducibility: Provides complete implementation and demonstration for easy reproduction and adoption

Applicable Scenarios

  1. Single-Cell Data Analysis: Primary target application domain
  2. Metagenomics: Extended application mentioned in the paper
  3. Any Scenario Requiring Categorical Data Distribution Comparison: General visualization problem

Technical Details

Implementation Architecture

  • Visualization Library: visx (D3-based)
  • UI Framework: React
  • State Management: Zustand + zundo
  • Python Integration: anywidget
  • Data Format: AnnData (zarr-indexed)

Interactive Features

  • Zooming and resizing
  • Multiple sorting methods (count, alphabetical, metadata)
  • Data filtering and grouping
  • Color scheme customization
  • High-resolution PNG export
  • Undo/redo operations

References

The paper cites 42 relevant references covering multiple domains including visual perception, bioinformatics, and visualization tools, providing a solid theoretical foundation for its methodological design.


Overall Assessment: This is a high-quality interdisciplinary research paper combining human-computer interaction and bioinformatics, addressing practical research needs with a complete solution already validated in real-world environments. The user-centered design methodology and cross-disciplinary collaboration are exemplary and worthy of emulation.