2025-11-23T09:49:16.774551

Readability and Understandability of Snippets Recommended by General-purpose Web Search Engines: a Comparative Study

Dantas, Maia

Developers often search for reusable code snippets on general-purpose web search engines like Google, Yahoo! or Microsoft Bing. But some of these code snippets may have poor quality in terms of readability or understandability. In this paper, we propose an empirical analysis to analyze the readability and understandability score from snippets extracted from the web using three independent variables: ranking, general-purpose web search engine, and recommended site. We collected the top-5 recommended sites and their respective code snippet recommendations using Google, Yahoo!, and Bing for 9,480 queries, and evaluate their readability and understandability scores. We found that some recommended sites have significantly better readability and understandability scores than others. The better-ranked code snippet is not necessarily more readable or understandable than a lower-ranked code snippet for all general-purpose web search engines. Moreover, considering the readability score, Google has better-ranked code snippets compared to Yahoo! or Microsoft Bing

academic

Readability and Understandability of Snippets Recommended by General-purpose Web Search Engines: a Comparative Study

Basic Information

Paper ID: 2110.07087
Title: Readability and Understandability of Snippets Recommended by General-purpose Web Search Engines: a Comparative Study
Authors: Carlos Eduardo C. Dantas, Marcelo A. Maia
Classification: cs.SE (Software Engineering)
Publication Date/Conference: AeSIR '21, November 15–11, 2021
Paper Link: https://arxiv.org/abs/2110.07087

Abstract

Developers frequently search for reusable code snippets on general-purpose search engines such as Google, Yahoo!, or Microsoft Bing. However, these code snippets may be of poor quality in terms of readability or understandability. This paper presents an empirical analysis that examines the readability and understandability scores of code snippets extracted from the web using three independent variables (ranking, general-purpose search engine, and recommended website). The study collected the top 5 recommended websites and their corresponding code snippet recommendations from Google, Yahoo!, and Bing for 9,480 queries, and evaluated their readability and understandability scores. The research found that certain recommended websites significantly outperform others in readability and understandability scores. Higher-ranked code snippets are not necessarily more readable or understandable than lower-ranked ones across all general-purpose search engines. Furthermore, in terms of readability scores, Google ranks code snippets better than Yahoo! or Microsoft Bing.

Research Background and Motivation

Problem Definition

Core Issue: Code snippets recommended by general-purpose search engines exhibit quality variations in readability and understandability; higher-ranked snippets are not necessarily of higher quality
Practical Need: Developers widely use general-purpose search engines to find code examples, but lack systematic evaluation of the quality of these code snippets
Search Engine Limitations: Although Google employs over 200 ranking factors, top-ranked pages may contain poorly-written code examples

Research Significance

Code snippet reuse can reduce programming task time and accelerate development processes
Google dominates over 90% of the search engine market share, but the code quality ranking situation of other search engines remains unknown
Need to understand the interrelationship between readability and understandability: readability relates to syntactic comprehension, while understandability relates to semantic aspects

Motivating Example

The paper cites a case study from Hora's research: when searching "File.mkdirs examples" on Google, Tutorialspoint's code snippet ranks higher despite having lower readability and reusability metrics, because it includes natural language explanations similar to the query.

Core Contributions

First Systematic Comparative Study: Large-scale comparative analysis of code snippet readability and understandability recommended by three major search engines: Google, Yahoo!, and Microsoft Bing
Large-Scale Dataset Construction: Collected 47,400 web links from 9,480 queries, covering 5,355 distinct websites
Multi-Dimensional Analysis Framework: Proposed an analysis method based on three independent variables: ranking, search engine, and recommended website
Empirical Findings: Confirmed two important hypotheses: higher-ranked code snippets do not necessarily possess higher readability/understandability; significant quality differences exist among different recommended websites
Standardized Understandability Metrics: Proposed a standardization method for converting cognitive complexity to the 0,1 interval

Methodology Details

Task Definition

Input: Programming-related query statements Output: Readability and understandability scores of code snippets Constraints: Only analyze Java language code snippets, considering the top 5 search results

Research Design Architecture

The study employs a five-step approach:

Select Input Queries: Collect 10,000 user queries from the CROKAGE tool
Collect Top-n Web Pages: Obtain top 5 recommended web pages from Google, Yahoo!, and Bing
Extract Code Snippets: Extract Java code snippets from selected websites
Calculate Metrics: Compute readability and understandability scores
Analysis Method: Use ANOVA and Tukey tests for statistical analysis

Key Technical Implementation

Data Collection Strategy

Query Modification: Add "example in java" tag to obtain Java code examples
Website Selection: Focus analysis on 5 most popular websites
- stackoverflow.com
- www.geeksforgeeks.org
- www.javatpoint.com
- www.tutorialspoint.com
- www.codegrepper.com

Code Extraction Method

StackOverflow: Extract Java code snippets from accepted answers
Other Websites: Use regular expressions to search for source code in HTML tags containing "example" and "Java"

Evaluation Metrics

Readability Measurement:

Uses predictive model proposed by Scalabrino et al.
Includes metrics such as comments, identifier consistency, text coherence, quantity of meanings, and concepts
Output range: 0,1, where 0 indicates low readability and 1 indicates high readability

Understandability Measurement:

Based on cognitive complexity proposed by Campbell
Standardization formula:

understandability(cs_i) = {
    1 - #cc/#mcc  if #cc < 15
    0.0           otherwise
}

where #cc is the cognitive complexity value and #mcc=15 is the maximum recommended value

Experimental Setup

Dataset Details

Query Source: User queries from the CROKAGE tool, sourced from over 80 countries
Data Scale: 9,480 valid queries, 47,400 web links
Website Coverage: 5,355 distinct websites
Language Restriction: Java programming language only

Evaluation Method

Statistical Analysis: Analysis of Variance (ANOVA), confidence level 5% (p-value<0.05)
Multiple Comparisons: Tukey test to identify significant differences between groups
Grouping Design:
- Search Engines: 3 groups (Google, Bing, Yahoo!)
- Ranking: 5 groups (top-1 to top-5)
- Websites: 5 groups (selected 5 websites)

Data Preprocessing

Remove duplicate queries and manually marked inapplicable queries
Filter queries with fewer than 5 web page recommendations
Use regular expressions to extract links from HTML tags

Experimental Results

Main Findings

RQ1: Relationship Between Search Engine Ranking and Code Quality

ANOVA Results: Readability p-value=0.0034, Understandability p-value=0.0003
Key Finding: Top-2 code snippets generally outperform Top-1, Top-4, and Top-5 in readability and understandability
Effect Size: Small (-0.02 to 0.01 for readability, -0.01 to 0.02 for understandability)
Conclusion: Confirms hypothesis H1; higher-ranked code snippets are not necessarily more readable or understandable

RQ2: Comparison Between Search Engines

ANOVA Results: Readability p-value=1.207e-12, Understandability p-value=0.0364
Readability Ranking: Google > Microsoft Bing > Yahoo!
Understandability: Google slightly outperforms Microsoft Bing
Effect Size: Small effect (-0.02 to 0.02 for readability, -0.01 to 0.005 for understandability)

RQ3: Comparison Between Recommended Websites

ANOVA Results: Both readability and understandability p-value<2.2e-16
Best Readability: geeksforgeeks
Best Understandability: tutorialspoint
Effect Size: Medium effect for readability (-0.15 to 0.10), small effect for understandability (-0.04 to 0.08)

Detailed Analysis Results

Readability Analysis

Reasons for GeeksforGeeks' superior performance:

Each line of code accompanied by a comment
High cohesion, with each concept independent
Example: Query "How to append to a string?"
- GeeksforGeeks: Readability score 0.94
- Tutorialspoint: Readability score 0.44

Understandability Analysis Limitations

58.3% of code snippets achieve maximum understandability score
Most code snippets are simple API calls lacking complex control structures
Recommendation that this metric is more suitable for complete Git repository-like files

Code Readability Research

Hora (2021): Investigates how Google ranks code snippets based on readability and reusability characteristics
Scalabrino et al.: Proposes code readability prediction model
Buse and Weimer: Learn code readability metrics

Code Search and Recommendation

API Sonar Tool: Uses readability features to rank code snippets
Muse Method: Uses readability features to rank code examples
CROKAGE: Code search engine extracting code snippets and explanations from StackOverflow

Code Quality Assessment

Treude and Robillard: Found that only 49% of StackOverflow code snippets are completely self-explanatory
Cognitive Complexity: Understandability measurement method proposed by Campbell

Conclusions and Discussion

Main Conclusions

Ranking Paradox: Search engine ranking is not completely correlated with code quality; Top-2 and Top-3 code snippets may have higher quality
Search Engine Differences: Google performs best in readability, but advantages are limited
Website Quality Differentiation: Significant quality differences exist among recommended websites; tutorial websites (GeeksforGeeks) have better readability
Understandability Limitations: Current understandability metrics have limited discriminative power for simple code snippets

Practical Implications

Developer Guidance: Recommend prioritizing code snippets from tutorial websites such as GeeksforGeeks
Search Strategy: Should not rely solely on ranking when selecting code snippets; need to comprehensively consider quality metrics
Tool Improvement: Provide reference standards for quality assessment in code search engines

Limitations

Limited Website Coverage: Only 5 websites analyzed, accounting for 34%-38.1% of recommended websites
Extraction Strategy: Only the first code snippet extracted from each web page
Query Modification Impact: Adding "example in java" may affect search results
Metric Precision: Readability and understandability tools may contain errors

Future Directions

Qualitative Research: Deeply understand the reasons for differences in readability and understandability scores
Extended Research: Include more websites or develop universal code extraction methods
Multi-Language Support: Extend to other programming languages
Multiple Code Snippet Handling: Develop heuristic methods for handling multiple code snippets on a single page

In-Depth Evaluation

Strengths

Research Novelty: First systematic comparative study of code snippet quality across mainstream search engines
Data Scale: Large-scale empirical study with sufficient data volume and credible conclusions
Methodological Rigor: Uses mature statistical analysis methods with statistically significant results
Practical Value: Provides empirical guidance for developers selecting code snippets
Reproducibility: Provides complete reproduction package and detailed methodology description

Weaknesses

Metric Limitations: Understandability metric has limited discriminative power for simple code snippets
Website Selection Bias: Only 5 mainstream websites analyzed, potential selection bias
Language Restriction: Only Java language considered, limited generalizability
Timeliness: Search results are time-sensitive; conclusions may change over time

Impact

Academic Contribution: Provides new perspective for code search and software engineering research
Practical Guidance: Directly influences developers' code search behavior
Tool Improvement: Provides basis for optimizing search engine and code recommendation system ranking algorithms
Subsequent Research: Establishes foundation for related field research

Applicable Scenarios

Quality assessment when software developers perform code searches
Optimization of ranking algorithms in code search engines
Code example quality control in programming education
Code quality analysis in software engineering research

References

The paper cites 23 related references, primarily including:

Code readability and understandability measurement methods
Code search and recommendation system research
StackOverflow code quality analysis
Search engine ranking mechanism research

Overall Assessment: This is a high-quality empirical software engineering research paper that fills a research gap in code search quality assessment, possessing significant theoretical value and practical significance. The research methodology is scientifically rigorous, the data scale is sufficient, and the conclusions are highly credible, providing valuable insights for both developers and researchers.