2025-11-23T09:49:16.774551

Readability and Understandability of Snippets Recommended by General-purpose Web Search Engines: a Comparative Study

Dantas, Maia
Developers often search for reusable code snippets on general-purpose web search engines like Google, Yahoo! or Microsoft Bing. But some of these code snippets may have poor quality in terms of readability or understandability. In this paper, we propose an empirical analysis to analyze the readability and understandability score from snippets extracted from the web using three independent variables: ranking, general-purpose web search engine, and recommended site. We collected the top-5 recommended sites and their respective code snippet recommendations using Google, Yahoo!, and Bing for 9,480 queries, and evaluate their readability and understandability scores. We found that some recommended sites have significantly better readability and understandability scores than others. The better-ranked code snippet is not necessarily more readable or understandable than a lower-ranked code snippet for all general-purpose web search engines. Moreover, considering the readability score, Google has better-ranked code snippets compared to Yahoo! or Microsoft Bing
academic

Readability and Understandability of Snippets Recommended by General-purpose Web Search Engines: a Comparative Study

Basic Information

  • Paper ID: 2110.07087
  • Title: Readability and Understandability of Snippets Recommended by General-purpose Web Search Engines: a Comparative Study
  • Authors: Carlos Eduardo C. Dantas, Marcelo A. Maia
  • Classification: cs.SE (Software Engineering)
  • Publication Date/Conference: AeSIR '21, November 15–11, 2021
  • Paper Link: https://arxiv.org/abs/2110.07087

Abstract

Developers frequently search for reusable code snippets on general-purpose search engines such as Google, Yahoo!, or Microsoft Bing. However, these code snippets may be of poor quality in terms of readability or understandability. This paper presents an empirical analysis that examines the readability and understandability scores of code snippets extracted from the web using three independent variables (ranking, general-purpose search engine, and recommended website). The study collected the top 5 recommended websites and their corresponding code snippet recommendations from Google, Yahoo!, and Bing for 9,480 queries, and evaluated their readability and understandability scores. The research found that certain recommended websites significantly outperform others in readability and understandability scores. Higher-ranked code snippets are not necessarily more readable or understandable than lower-ranked ones across all general-purpose search engines. Furthermore, in terms of readability scores, Google ranks code snippets better than Yahoo! or Microsoft Bing.

Research Background and Motivation

Problem Definition

  1. Core Issue: Code snippets recommended by general-purpose search engines exhibit quality variations in readability and understandability; higher-ranked snippets are not necessarily of higher quality
  2. Practical Need: Developers widely use general-purpose search engines to find code examples, but lack systematic evaluation of the quality of these code snippets
  3. Search Engine Limitations: Although Google employs over 200 ranking factors, top-ranked pages may contain poorly-written code examples

Research Significance

  • Code snippet reuse can reduce programming task time and accelerate development processes
  • Google dominates over 90% of the search engine market share, but the code quality ranking situation of other search engines remains unknown
  • Need to understand the interrelationship between readability and understandability: readability relates to syntactic comprehension, while understandability relates to semantic aspects

Motivating Example

The paper cites a case study from Hora's research: when searching "File.mkdirs examples" on Google, Tutorialspoint's code snippet ranks higher despite having lower readability and reusability metrics, because it includes natural language explanations similar to the query.

Core Contributions

  1. First Systematic Comparative Study: Large-scale comparative analysis of code snippet readability and understandability recommended by three major search engines: Google, Yahoo!, and Microsoft Bing
  2. Large-Scale Dataset Construction: Collected 47,400 web links from 9,480 queries, covering 5,355 distinct websites
  3. Multi-Dimensional Analysis Framework: Proposed an analysis method based on three independent variables: ranking, search engine, and recommended website
  4. Empirical Findings: Confirmed two important hypotheses: higher-ranked code snippets do not necessarily possess higher readability/understandability; significant quality differences exist among different recommended websites
  5. Standardized Understandability Metrics: Proposed a standardization method for converting cognitive complexity to the 0,1 interval

Methodology Details

Task Definition

Input: Programming-related query statements Output: Readability and understandability scores of code snippets Constraints: Only analyze Java language code snippets, considering the top 5 search results

Research Design Architecture

The study employs a five-step approach:

  1. Select Input Queries: Collect 10,000 user queries from the CROKAGE tool
  2. Collect Top-n Web Pages: Obtain top 5 recommended web pages from Google, Yahoo!, and Bing
  3. Extract Code Snippets: Extract Java code snippets from selected websites
  4. Calculate Metrics: Compute readability and understandability scores
  5. Analysis Method: Use ANOVA and Tukey tests for statistical analysis

Key Technical Implementation

Data Collection Strategy

Code Extraction Method

  • StackOverflow: Extract Java code snippets from accepted answers
  • Other Websites: Use regular expressions to search for source code in HTML tags containing "example" and "Java"

Evaluation Metrics

Readability Measurement:

  • Uses predictive model proposed by Scalabrino et al.
  • Includes metrics such as comments, identifier consistency, text coherence, quantity of meanings, and concepts
  • Output range: 0,1, where 0 indicates low readability and 1 indicates high readability

Understandability Measurement:

  • Based on cognitive complexity proposed by Campbell
  • Standardization formula:
understandability(cs_i) = {
    1 - #cc/#mcc  if #cc < 15
    0.0           otherwise
}

where #cc is the cognitive complexity value and #mcc=15 is the maximum recommended value

Experimental Setup

Dataset Details

  • Query Source: User queries from the CROKAGE tool, sourced from over 80 countries
  • Data Scale: 9,480 valid queries, 47,400 web links
  • Website Coverage: 5,355 distinct websites
  • Language Restriction: Java programming language only

Evaluation Method

  • Statistical Analysis: Analysis of Variance (ANOVA), confidence level 5% (p-value<0.05)
  • Multiple Comparisons: Tukey test to identify significant differences between groups
  • Grouping Design:
    • Search Engines: 3 groups (Google, Bing, Yahoo!)
    • Ranking: 5 groups (top-1 to top-5)
    • Websites: 5 groups (selected 5 websites)

Data Preprocessing

  • Remove duplicate queries and manually marked inapplicable queries
  • Filter queries with fewer than 5 web page recommendations
  • Use regular expressions to extract links from HTML tags

Experimental Results

Main Findings

RQ1: Relationship Between Search Engine Ranking and Code Quality

  • ANOVA Results: Readability p-value=0.0034, Understandability p-value=0.0003
  • Key Finding: Top-2 code snippets generally outperform Top-1, Top-4, and Top-5 in readability and understandability
  • Effect Size: Small (-0.02 to 0.01 for readability, -0.01 to 0.02 for understandability)
  • Conclusion: Confirms hypothesis H1; higher-ranked code snippets are not necessarily more readable or understandable

RQ2: Comparison Between Search Engines

  • ANOVA Results: Readability p-value=1.207e-12, Understandability p-value=0.0364
  • Readability Ranking: Google > Microsoft Bing > Yahoo!
  • Understandability: Google slightly outperforms Microsoft Bing
  • Effect Size: Small effect (-0.02 to 0.02 for readability, -0.01 to 0.005 for understandability)
  • ANOVA Results: Both readability and understandability p-value<2.2e-16
  • Best Readability: geeksforgeeks
  • Best Understandability: tutorialspoint
  • Effect Size: Medium effect for readability (-0.15 to 0.10), small effect for understandability (-0.04 to 0.08)

Detailed Analysis Results

Readability Analysis

Reasons for GeeksforGeeks' superior performance:

  • Each line of code accompanied by a comment
  • High cohesion, with each concept independent
  • Example: Query "How to append to a string?"
    • GeeksforGeeks: Readability score 0.94
    • Tutorialspoint: Readability score 0.44

Understandability Analysis Limitations

  • 58.3% of code snippets achieve maximum understandability score
  • Most code snippets are simple API calls lacking complex control structures
  • Recommendation that this metric is more suitable for complete Git repository-like files

Code Readability Research

  • Hora (2021): Investigates how Google ranks code snippets based on readability and reusability characteristics
  • Scalabrino et al.: Proposes code readability prediction model
  • Buse and Weimer: Learn code readability metrics

Code Search and Recommendation

  • API Sonar Tool: Uses readability features to rank code snippets
  • Muse Method: Uses readability features to rank code examples
  • CROKAGE: Code search engine extracting code snippets and explanations from StackOverflow

Code Quality Assessment

  • Treude and Robillard: Found that only 49% of StackOverflow code snippets are completely self-explanatory
  • Cognitive Complexity: Understandability measurement method proposed by Campbell

Conclusions and Discussion

Main Conclusions

  1. Ranking Paradox: Search engine ranking is not completely correlated with code quality; Top-2 and Top-3 code snippets may have higher quality
  2. Search Engine Differences: Google performs best in readability, but advantages are limited
  3. Website Quality Differentiation: Significant quality differences exist among recommended websites; tutorial websites (GeeksforGeeks) have better readability
  4. Understandability Limitations: Current understandability metrics have limited discriminative power for simple code snippets

Practical Implications

  • Developer Guidance: Recommend prioritizing code snippets from tutorial websites such as GeeksforGeeks
  • Search Strategy: Should not rely solely on ranking when selecting code snippets; need to comprehensively consider quality metrics
  • Tool Improvement: Provide reference standards for quality assessment in code search engines

Limitations

  1. Limited Website Coverage: Only 5 websites analyzed, accounting for 34%-38.1% of recommended websites
  2. Extraction Strategy: Only the first code snippet extracted from each web page
  3. Query Modification Impact: Adding "example in java" may affect search results
  4. Metric Precision: Readability and understandability tools may contain errors

Future Directions

  1. Qualitative Research: Deeply understand the reasons for differences in readability and understandability scores
  2. Extended Research: Include more websites or develop universal code extraction methods
  3. Multi-Language Support: Extend to other programming languages
  4. Multiple Code Snippet Handling: Develop heuristic methods for handling multiple code snippets on a single page

In-Depth Evaluation

Strengths

  1. Research Novelty: First systematic comparative study of code snippet quality across mainstream search engines
  2. Data Scale: Large-scale empirical study with sufficient data volume and credible conclusions
  3. Methodological Rigor: Uses mature statistical analysis methods with statistically significant results
  4. Practical Value: Provides empirical guidance for developers selecting code snippets
  5. Reproducibility: Provides complete reproduction package and detailed methodology description

Weaknesses

  1. Metric Limitations: Understandability metric has limited discriminative power for simple code snippets
  2. Website Selection Bias: Only 5 mainstream websites analyzed, potential selection bias
  3. Language Restriction: Only Java language considered, limited generalizability
  4. Timeliness: Search results are time-sensitive; conclusions may change over time

Impact

  1. Academic Contribution: Provides new perspective for code search and software engineering research
  2. Practical Guidance: Directly influences developers' code search behavior
  3. Tool Improvement: Provides basis for optimizing search engine and code recommendation system ranking algorithms
  4. Subsequent Research: Establishes foundation for related field research

Applicable Scenarios

  • Quality assessment when software developers perform code searches
  • Optimization of ranking algorithms in code search engines
  • Code example quality control in programming education
  • Code quality analysis in software engineering research

References

The paper cites 23 related references, primarily including:

  • Code readability and understandability measurement methods
  • Code search and recommendation system research
  • StackOverflow code quality analysis
  • Search engine ranking mechanism research

Overall Assessment: This is a high-quality empirical software engineering research paper that fills a research gap in code search quality assessment, possessing significant theoretical value and practical significance. The research methodology is scientifically rigorous, the data scale is sufficient, and the conclusions are highly credible, providing valuable insights for both developers and researchers.