2025-11-25T01:52:16.261661

Position: The Artificial Intelligence and Machine Learning Community Should Adopt a More Transparent and Regulated Peer Review Process

Yang
The rapid growth of submissions to top-tier Artificial Intelligence (AI) and Machine Learning (ML) conferences has prompted many venues to transition from closed to open review platforms. Some have fully embraced open peer reviews, allowing public visibility throughout the process, while others adopt hybrid approaches, such as releasing reviews only after final decisions or keeping reviews private despite using open peer review systems. In this work, we analyze the strengths and limitations of these models, highlighting the growing community interest in transparent peer review. To support this discussion, we examine insights from Paper Copilot, a website launched two years ago to aggregate and analyze AI / ML conference data while engaging a global audience. The site has attracted over 200,000 early-career researchers, particularly those aged 18-34 from 177 countries, many of whom are actively engaged in the peer review process. Drawing on our findings, this position paper advocates for a more transparent, open, and well-regulated peer review aiming to foster greater community involvement and propel advancements in the field.
academic

Position: The Artificial Intelligence and Machine Learning Community Should Adopt a More Transparent and Regulated Peer Review Process

Basic Information

  • Paper ID: 2502.00874
  • Title: Position: The Artificial Intelligence and Machine Learning Community Should Adopt a More Transparent and Regulated Peer Review Process
  • Author: Jing Yang (University of Southern California, papercopilot.com)
  • Classification: cs.DL cs.AI cs.CV cs.CY
  • Publication Time/Conference: Proceedings of the 42nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025
  • Paper Link: https://arxiv.org/abs/2502.00874

Abstract

With the rapid growth in submission volumes to top-tier artificial intelligence (AI) and machine learning (ML) conferences, many conferences have transitioned from closed review platforms to open review platforms. Some conferences have fully adopted open peer review, allowing public visibility throughout the entire process, while others employ hybrid approaches, such as publishing reviews only after final decisions or maintaining review privacy despite using open review systems. This paper analyzes the advantages and limitations of these models, highlighting the community's growing interest in transparent peer review. To support this discussion, we examine insights from Paper Copilot, a website launched two years ago to aggregate and analyze AI/ML conference data and attract a global audience. The website has attracted over 200,000 early-career researchers from 177 countries, particularly those aged 18-34, many of whom actively participate in the peer review process. Based on our findings, this position paper advocates for more transparent, open, and regulated peer review, aiming to promote greater community participation and advance the field.

Research Background and Motivation

Problem Definition

The core issue addressed in this paper is the insufficient transparency and standardization of the peer review process in the AI/ML academic community. Specifically, this includes:

  1. Explosive growth in submissions to top-tier AI/ML conferences (exceeding 10,000 papers), placing enormous pressure on traditional review practices regarding fairness, efficiency, and quality maintenance
  2. Different conferences adopting different review transparency models (fully open, partially open, completely closed), lacking unified standards
  3. Increasing proportion of junior reviewers lacking experience, potentially affecting review quality
  4. Lack of regulation in the use of AI tools in reviews, posing ethical risks

Significance

The importance of this issue is manifested in:

  1. Maintenance of Academic Integrity: Transparent review processes help detect and prevent academic misconduct
  2. Promotion of Community Participation: Open review can enhance the engagement and collaboration of community members
  3. Improvement of Review Quality: Public oversight can increase the objectivity and constructiveness of reviews
  4. Acceleration of Knowledge Dissemination: Transparent review processes facilitate rapid dissemination of academic knowledge

Limitations of Existing Approaches

  1. Completely Closed Review: Lacks oversight and accountability mechanisms, prone to inconsistencies and bias
  2. Partially Open Review: While publishing reviews after decisions, it limits real-time community participation
  3. Completely Open Review: May cause reviewers to be overly cautious, affecting candid feedback

Research Motivation

Through the Paper Copilot platform, the author collected substantial data revealing:

  • Over 200,000 active users from 177 countries demonstrate strong interest in transparent review
  • Young researchers aged 18-34 constitute the primary user demographic
  • Open review conferences achieve higher community engagement

Core Contributions

  1. Provision of Open Statistical Data: Provides visualized statistics through Paper Copilot including review score distributions, review timelines, and author/institution analyses
  2. Quantification of Community Interest Evidence: Based on two years of engagement data, provides quantitative evidence of growing community interest in review transparency
  3. Critical Analysis: Systematically analyzes the advantages and disadvantages of various review models
  4. Policy Recommendations: Advocates for adoption of more transparent, open, and regulated peer review processes

Methodology Details

Data Collection Methodology

Automated Data Acquisition

  1. Public APIs and Web Scraping:
    • Retrieves scores, confidence levels, and review comments from open review conferences such as ICLR via OpenReview API
    • Deploys custom scrapers for daily data acquisition, creating time-series archives
    • Obtains author identity and institutional information from official websites
  2. Community Submissions:
    • Collects anonymous review information from partially open or closed review conferences via Google Forms
    • Collected 3,876 valid responses over the past year

Data Processing Pipeline

  • Standardized data cleaning, merging, and storage pipeline
  • Open-source dataset
  • Interactive frontend visualization interface

Analysis Framework

Review Transparency Classification

  1. Completely Open: All reviews and discussions are publicly visible in real-time (e.g., ICLR)
  2. Partially Open: Reviews and discussions are published only after the decision phase concludes (e.g., NeurIPS, CoRL)
  3. Completely Closed: Reviews and discussions remain permanently private (e.g., ICML, CVPR)

User Analysis Dimensions

  • Age and gender distribution
  • Geographic distribution (177 countries)
  • Engagement time and click-through rate analysis
  • Search engine ranking performance

Experimental Setup

Dataset Scale

  • Time Span: 10 years of available data
  • Conference Coverage: 24 conferences spanning 9 AI/ML sub-domains
  • User Data: Over 200,000 active users from 177 countries
  • Website Statistics: 6 million impressions, 1 million website visits, 4 million user-triggered events

Evaluation Metrics

  1. User Engagement: Page views, active users, average engagement time
  2. Search Performance: Google click-through rate (CTR), page ranking position
  3. Review Quality: Confidence scores, number of discussion replies
  4. Community Interest: Voluntary data submission rate, survey response rate

Comparative Analysis

  • User engagement comparison across conferences with different transparency levels
  • Detailed comparison between ICLR (completely open) and NeurIPS (partially open)
  • Engagement analysis of closed review conferences

Experimental Results

Main Findings

Significant Differences in User Engagement

  • ICLR (Completely Open): 414,096 page views, 88,220 active users, average engagement time 3 minutes 50 seconds
  • NeurIPS (Partially Open): Significantly lower engagement than ICLR
  • Closed Conferences (CVPR, ECCV): Page views below 35,000, average engagement time less than 1.5 minutes

Search Engine Performance

  • Google CTR remains consistent between 66.08%-86.49%
  • Open review-related pages rank higher in search results
  • Generated 50,000 organic clicks from Google search alone in the past 28 days

Review Quality Analysis

  1. Confidence Scores:
    • ICLR: 3.53 ± 0.48 (2024)
    • NeurIPS: 3.58 ± 0.54 (2024)
    • Completely open reviews show slightly lower concentration of high confidence scores
  2. Discussion Activity:
    • ICLR shows broader reply distribution (maximum 76 replies vs. NeurIPS's 49)
    • ICLR's discussion variance is significantly larger, reflecting a more dynamic review environment

User Profile Analysis

Age and Gender Distribution

  • Primary User Group: Ages 18-24 represent the largest proportion
  • Engagement Time: Young male users show the longest average engagement time (4 minutes 15 seconds)
  • Female Users: Relatively consistent engagement time across age groups

Geographic Distribution

  • Primary Countries: United States (60,648 users), China (59,269 users)
  • High Engagement Regions: Singapore and Australia show average engagement time exceeding 3 minutes
  • Engagement Variation: United Kingdom and Germany show relatively shorter engagement time (below 2 minutes)

Open Peer Review Research

  • Theoretical Foundation: Ross-Hellauer (2017) and others established theoretical frameworks for OPR
  • Practical Exploration: OpenReview platform promoted OPR application in AI/ML domains
  • Quality Research: Church et al. (2024) investigated the impact of open review on feedback quality

Standardization Research

  • Ethical Considerations: Research on privacy and harassment risks of public review
  • AI-Assisted Review: Discussion of AI tool applications in review and regulatory needs
  • Bias and Fairness: Analysis of systemic bias issues in review processes

Conclusions and Discussion

Main Conclusions

  1. Clear Community Needs: High engagement of over 200,000 global users demonstrates strong demand for transparent review
  2. Significant Advantages of Open Review: Completely open review processes promote greater community participation and richer academic discussion
  3. Young Researchers Leading: Researchers aged 18-34 are primary drivers of transparent review
  4. Quality and Transparency Compatible: Open review does not compromise review quality; rather, it promotes more careful evaluation

Problems with Closed Review

  1. Junior Reviewer Challenges: Inexperienced reviewers struggle to receive guidance in closed environments
  2. Lack of AI Use Regulation: Closed environments make it difficult to monitor and regulate AI tool usage
  3. Insufficient Accountability Mechanisms: Problems such as author information inconsistencies are difficult to correct promptly

Policy Recommendations

  1. Gradual Transition to Openness: Recommend more conferences adopt at least partially open review models
  2. Establish Standardized Guidelines: Develop guidelines for AI-assisted review usage
  3. Strengthen Training Support: Provide more training and guidance for junior reviewers
  4. Improve Oversight Mechanisms: Establish more effective quality control and accountability systems

In-Depth Evaluation

Strengths

Methodological Innovation

  1. Large-Scale Empirical Research: First analysis of review transparency needs based on real behavioral data from over 200,000 users
  2. Multi-Dimensional Analysis: Combines user behavior, search data, review quality, and other dimensions
  3. Real-Time Data Collection: Continuously collects and analyzes data through the Paper Copilot platform
  4. Global Perspective: Covers 177 countries, providing a truly global perspective

Experimental Sufficiency

  1. Large Data Scale: 10 years of historical data, 24 conferences, 9 sub-domains
  2. Multi-Source Verification: Combines API data, website data, and community submission data
  3. Quantitative and Qualitative Integration: Includes both statistical data and user research
  4. Time Series Analysis: Tracks dynamic changes in the review process

Result Convincingness

  1. Consistent Findings: Multiple metrics consistently point to open review advantages
  2. Statistical Significance: User engagement differences are clear and consistent
  3. Practical Impact: Paper Copilot itself demonstrates successful transparency practice

Limitations

Methodological Limitations

  1. Selection Bias: Voluntary data submission may introduce selection bias
  2. Causality: Cannot fully establish causal relationships between transparency and engagement
  3. Cultural Differences: Different countries may have varying acceptance levels for transparency
  4. Time Effects: The impact of review model changes may require longer periods to manifest

Analysis Depth

  1. Limited Quality Assessment: Primarily focuses on engagement; assessment of actual review quality is relatively limited
  2. Insufficient Negative Impact Analysis: Insufficient discussion of potential negative effects of open review
  3. Lack of Implementation Details: Insufficient operational guidance on how to specifically implement transparent review

Generalizability Issues

  1. Domain Specificity: Primarily based on AI/ML domains; applicability to other fields is unknown
  2. Cultural Background: Different academic cultures have varying acceptance levels for transparency
  3. Technical Barriers: Open review requires certain technical infrastructure support

Impact Assessment

Academic Contribution

  1. Fills Research Gap: First large-scale quantitative analysis of community demand for review transparency
  2. Policy Reference Value: Provides data-driven decision references for conference organizers
  3. Methodological Contribution: Establishes new methodological frameworks for review process analysis

Practical Value

  1. Direct Application: Paper Copilot platform is widely used
  2. Policy Impact: May influence future conference review policy formulation
  3. Tool Value: Provided data and analysis tools have sustained value

Reproducibility

  1. Open-Source Data: Commits to open-sourcing collected datasets
  2. Transparent Methodology: Provides detailed descriptions of data collection and analysis methods
  3. Platform Accessibility: Paper Copilot platform continues to operate; results are verifiable

Applicable Scenarios

Direct Application

  1. AI/ML Conferences: Can be directly applied to various AI/ML conference types
  2. Computer Science: Can be extended to other computer science sub-domains
  3. Technology-Driven Fields: Applicable to other rapidly developing technical fields

Requiring Adjustment

  1. Traditional Disciplines: Humanities and social sciences require consideration of cultural factors
  2. Sensitive Fields: Research involving trade secrets or national security requires special consideration
  3. Small-Scale Conferences: Small specialized conferences may require adjusted implementation approaches

Future Research Directions

  1. Cross-Domain Validation: Validate research conclusions in other academic disciplines
  2. Long-Term Impact Research: Track long-term impacts of review model changes
  3. Review Quality Assessment Methods: Develop more precise review quality assessment methods
  4. Implementation Guideline Development: Formulate specific guidelines for transparent review implementation
  5. Cultural Adaptability Research: Study adaptive adjustments for different cultural backgrounds

References

This paper cites abundant related research, primarily including:

  • Ross-Hellauer, T. (2017). What is open peer review? A systematic review.
  • Wang, G., et al. (2023). What have we learned from openreview?
  • Cortes, C. & Lawrence, N. D. (2021). Inconsistency in conference peer review
  • Beygelzimer, A., et al. (2023). Has the machine learning review process become more arbitrary

Overall Assessment: This is a position paper with significant practical importance, providing systematic analysis and recommendations on review transparency in the AI/ML academic community based on large-scale real data. The paper's primary value lies in providing quantitative evidence supporting the necessity of transparent review and demonstrating practical application effects through the Paper Copilot platform. While there is room for improvement in methodology and analysis depth, its contribution to promoting reform in academic review systems is noteworthy.