Readability and Understandability of Snippets Recommended by General-purpose Web Search Engines: a Comparative Study
Dantas, Maia
Developers often search for reusable code snippets on general-purpose web search engines like Google, Yahoo! or Microsoft Bing. But some of these code snippets may have poor quality in terms of readability or understandability. In this paper, we propose an empirical analysis to analyze the readability and understandability score from snippets extracted from the web using three independent variables: ranking, general-purpose web search engine, and recommended site. We collected the top-5 recommended sites and their respective code snippet recommendations using Google, Yahoo!, and Bing for 9,480 queries, and evaluate their readability and understandability scores. We found that some recommended sites have significantly better readability and understandability scores than others. The better-ranked code snippet is not necessarily more readable or understandable than a lower-ranked code snippet for all general-purpose web search engines. Moreover, considering the readability score, Google has better-ranked code snippets compared to Yahoo! or Microsoft Bing
academic
Readability and Understandability of Snippets Recommended by General-purpose Web Search Engines: a Comparative Study
Developers frequently search for reusable code snippets on general-purpose search engines such as Google, Yahoo!, or Microsoft Bing. However, these code snippets may be of poor quality in terms of readability or understandability. This paper presents an empirical analysis that examines the readability and understandability scores of code snippets extracted from the web using three independent variables (ranking, general-purpose search engine, and recommended website). The study collected the top 5 recommended websites and their corresponding code snippet recommendations from Google, Yahoo!, and Bing for 9,480 queries, and evaluated their readability and understandability scores. The research found that certain recommended websites significantly outperform others in readability and understandability scores. Higher-ranked code snippets are not necessarily more readable or understandable than lower-ranked ones across all general-purpose search engines. Furthermore, in terms of readability scores, Google ranks code snippets better than Yahoo! or Microsoft Bing.
Core Issue: Code snippets recommended by general-purpose search engines exhibit quality variations in readability and understandability; higher-ranked snippets are not necessarily of higher quality
Practical Need: Developers widely use general-purpose search engines to find code examples, but lack systematic evaluation of the quality of these code snippets
Search Engine Limitations: Although Google employs over 200 ranking factors, top-ranked pages may contain poorly-written code examples
Code snippet reuse can reduce programming task time and accelerate development processes
Google dominates over 90% of the search engine market share, but the code quality ranking situation of other search engines remains unknown
Need to understand the interrelationship between readability and understandability: readability relates to syntactic comprehension, while understandability relates to semantic aspects
The paper cites a case study from Hora's research: when searching "File.mkdirs examples" on Google, Tutorialspoint's code snippet ranks higher despite having lower readability and reusability metrics, because it includes natural language explanations similar to the query.
First Systematic Comparative Study: Large-scale comparative analysis of code snippet readability and understandability recommended by three major search engines: Google, Yahoo!, and Microsoft Bing
Large-Scale Dataset Construction: Collected 47,400 web links from 9,480 queries, covering 5,355 distinct websites
Multi-Dimensional Analysis Framework: Proposed an analysis method based on three independent variables: ranking, search engine, and recommended website
Empirical Findings: Confirmed two important hypotheses: higher-ranked code snippets do not necessarily possess higher readability/understandability; significant quality differences exist among different recommended websites
Standardized Understandability Metrics: Proposed a standardization method for converting cognitive complexity to the 0,1 interval
Input: Programming-related query statements
Output: Readability and understandability scores of code snippets
Constraints: Only analyze Java language code snippets, considering the top 5 search results
The paper cites 23 related references, primarily including:
Code readability and understandability measurement methods
Code search and recommendation system research
StackOverflow code quality analysis
Search engine ranking mechanism research
Overall Assessment: This is a high-quality empirical software engineering research paper that fills a research gap in code search quality assessment, possessing significant theoretical value and practical significance. The research methodology is scientifically rigorous, the data scale is sufficient, and the conclusions are highly credible, providing valuable insights for both developers and researchers.