Every year, the European Union and its member states allocate millions of euros to fund various development initiatives. However, the increasing number of applications received for these programs often creates significant bottlenecks in evaluation processes, due to limited human capacity. In this work, we detail the real-world deployment of AI-assisted evaluation within the pipeline of two government initiatives: (i) corporate applications aimed at international business expansion, and (ii) citizen reimbursement claims for investments in energy-efficient home improvements. While these two cases involve distinct evaluation procedures, our findings confirm that AI effectively enhanced processing efficiency and reduced workload across both types of applications. Specifically, in the citizen reimbursement claims initiative, our solution increased reviewer productivity by 20.1%, while keeping a negligible false-positive rate based on our test set observations. These improvements resulted in an overall reduction of more than 2 months in the total evaluation time, illustrating the impact of AI-driven automation in large-scale evaluation workflows.
- Paper ID: 2510.09674
- Title: Leveraging LLMs to Streamline the Review of Public Funding Applications
- Authors: João D.S. Marques, André V. Duarte, André Carvalho, Gil Rocha, Bruno Martins, Arlindo L. Oliveira
- Classification: cs.CY cs.AI
- Publication Date: October 8, 2025 (arXiv preprint)
- Paper Link: https://arxiv.org/abs/2510.09674
Annually, the European Union and its member states invest millions of euros in funding various development initiatives. However, the volume of applications received by these programs continues to increase, and due to limited human resources, this often creates significant bottlenecks in the evaluation process. This study details the practical deployment of AI-assisted assessment in two government initiative pipelines: (i) business applications for international business expansion, and (ii) citizen reimbursement applications for energy-efficient home improvement investments. Although these two scenarios involve different assessment procedures, the research demonstrates that AI effectively improves processing efficiency and reduces workload for both application types. Specifically, in the citizen reimbursement initiative, the solution increased reviewer productivity by 20.1% while maintaining a negligible false positive rate based on test set observations. These improvements reduced total assessment time by over two months, demonstrating the impact of AI-driven automation in large-scale evaluation workflows.
The core problem addressed by this research is the efficiency bottleneck in evaluating European Union public funding projects. With the surge in application volumes, traditional manual evaluation methods can no longer meet processing demands, resulting in prolonged evaluation cycles, decreased applicant satisfaction, and ultimately undermining public confidence in the efficiency of these initiatives.
Public funding projects are crucial tools for driving economic growth, sustainable development, and innovation. Low evaluation efficiency not only affects the timeliness of fund allocation but may also cause high-quality projects to miss opportunities, impacting the achievement of overall policy objectives.
Traditional document review relies on rule-based natural language processing and optical character recognition technologies, which perform well in controlled environments but are highly sensitive to changes in document structure and content, making them difficult to maintain and scale to broader applications.
The emergence of Large Language Models (LLMs) provides unprecedented flexibility and adaptability for automating document processing. This research aims to explore how to leverage LLMs to improve the efficiency and consistency of public funding application evaluation while ensuring human oversight.
- Real-World Deployment Experience Report: First report of successful deployment of two AI-assisted document evaluation systems, demonstrating how automation can accelerate application analysis while ensuring decision integrity through human oversight.
- Practical Effectiveness Verification: Achieved 20.1% reviewer productivity improvement in the ReClaim initiative, reducing total assessment time by over two months.
- Best Practices Summary: Based on real-world deployment experience, provides best practices and key lessons learned for integrating AI models into similar environments.
- Dual-Scenario Validation: Validates the generalizability of AI-assisted evaluation through two different types of government initiatives (business internationalization applications and citizen energy-efficiency renovation reimbursements).
The research involves two distinct tasks:
- IExp Task: Comprehensive evaluation of business internationalization applications, including document summarization generation, internal consistency detection, and preliminary scoring
- ReClaim Task: Document verification for citizen energy-efficiency renovation reimbursement applications, primarily checking consistency between application information and supporting documents
- Input: Business application documents averaging 30,000 tokens (over 50 pages)
- Core Model: GPT-4o
- Processing Pipeline:
- Document segmentation and filtering to avoid LLM context overload
- Identification of key fields for each task based on evaluation team expertise
- Automation of six most time-consuming evaluation tasks
- Output: Application summaries, consistency reports, preliminary scores, and justifications
- Input: Approximately 80,000 applications, each with an average of 11 supporting documents
- Hybrid Processing Pipeline:
- Document Standardization: Supporting only widely-used file formats (PDF, ZIP, PNG, etc.)
- XML Conversion: Converting user form fields into structured XML format
- VLM Information Extraction: Using GPT-4o to parse unstructured supporting documents
- Automated Consistency Checking: Comparing extracted information with applicant-reported values
- Output: Pre-populated verification checklists flagging items requiring manual review
- Human-Machine Collaboration Design: System outputs serve only as recommendations, ensuring human reviewers always maintain oversight and accountability.
- Task-Specific Optimization: Customized solutions for different types of evaluation tasks.
- Cost-Benefit Balance: Cost control achieved through targeted inputs and task prioritization.
- GDPR Compliance: Data processing entirely within EU boundaries, stored on encrypted local disks.
- IExp Dataset:
- Proof of concept: 50 applications from previous calls
- Current evaluation: 11 applications using AI tool support
- Activity classification: 764 historical applications
- ReClaim Dataset:
- Total applications: Approximately 80,000
- Test set: 200 samples uniformly distributed across types
- Total documents: Approximately 880,000 documents
- IExp Metrics:
- Summary alignment: Cosine similarity, ROUGE-L, BLEU, METEOR
- Activity classification consistency: Consistency level between reviewers and LLM
- ReClaim Metrics:
- Productivity improvement: Percentage reduction in processing time
- Automated verification rate: Proportion of fields requiring no manual verification
- Accuracy: Proportions of correct, minor errors, false positives, false negatives, and reading errors
- Model Selection: Blind comparison of GPT-4o vs Gemini-1.5 Pro
- Processing Approach: Comparison of AI-assisted vs purely manual processing
- Significant Summary Alignment Improvement:
- Cosine similarity improved from 0.77 to 0.99
- ROUGE-L, BLEU, and METEOR metrics all improved from below 0.35 to above 0.9
- Activity Classification Consistency:
- LLM-reviewer consistency approximately 70%
- LLM-applicant consistency higher
- Productivity Improvement: Reviewer productivity increased approximately 20%
- Automated Verification Performance:
- Overall automated verification rate: 76%
- Verification rates by section: Eligibility check 84%, public core 76%, type review 67%
- Accuracy Analysis:
- Correct: 88%
- Minor errors: 5%
- False positives: 0%
- False negatives: 3%
- Reading errors: 4%
Positive impacts following AI system deployment:
- Clarification requests/applications: Decreased from 2.13 to 2.05
- Applicant appeal rate: Decreased from 25.8% to 20.4%
- IExp Task: Evaluators estimated AI assistance could accelerate the review process by up to 30%
- ReClaim Task: Feedback was polarized
- Reviewers involved in development expressed strong appreciation
- Experienced reviewers estimated time savings up to 40%
- Some reviewers lost confidence after encountering errors
Traditional automated document review relies on rule-based NLP and OCR techniques, performing well in controlled environments but sensitive to document structure variations and difficult to maintain.
- Legal Domain: LLM tools capable of rapidly reviewing and extracting various legal texts
- Human Resources: Evolution from basic keyword analysis to complex candidate-role matching
- Public Administration: Transition from traditional machine learning solutions to generative AI and LLM integration
Due to failure cases caused by bias, insufficient transparency, or over-reliance on unsupervised automation, most organizations now embed explicit human-machine collaborative review at critical decision points.
- Technical Feasibility: LLMs have reached sufficient maturity to significantly support application review processes.
- Significant Efficiency Gains: In appropriately integrated human-machine collaboration pipelines, LLMs can substantially accelerate evaluation workflows.
- Improved Consistency: AI assistance helps improve uniformity in reviewer outputs.
- Bureaucracy is often the primary cause of delays and reduced solution quality
- Third-party platform ownership limitations restrict system modification capabilities
- Strict GDPR requirements narrow the range of viable models
- Complex multi-step authorization workflows delay data access
- Reviewers often divide into two groups: those willing to use tools and focused on their advantages, and those becoming very cautious or critical when systems make errors
- Effective change management is critical for successful implementation
- Large-scale deployment speed far exceeds manual evaluation
- ReClaim system processed approximately 80,000 applications in less than three weeks
- As models continue to improve, fully automated evaluation becomes increasingly feasible
- IExp System: Limited by inability to access historical applications or external databases
- ReClaim System: Faces challenges with inconsistent document formats and low-quality file submissions
- Applicable Scope: Approximately 10% of documents excluded from automated parsing due to unsupported formats
- Practical Deployment Value: Among few studies reporting real-world LLM deployment experiences, providing important practical guidance
- Comprehensive Evaluation Framework: Comprehensive evaluation dimensions spanning from technical metrics to user feedback, efficiency improvements to system impacts
- Dual-Scenario Validation: Validates method generalizability through two different application scenarios
- Honest Experience Sharing: Objectively reports challenges and failures encountered during deployment
- Limited Technical Innovation: Primarily application of existing LLM technology, lacking algorithmic-level innovation
- Limited Evaluation Scale: Relatively small test sets, particularly 11 samples for IExp task
- Unknown Long-Term Effects: Only three months of deployment time; long-term effects and stability require further verification
- Insufficient Cost-Benefit Analysis: Lacks detailed cost-benefit analysis and ROI calculations
- Policy-Making Reference: Provides important reference for government agencies adopting AI technology
- Practical Guidance Value: Offers valuable experience for AI deployment in similar scenarios
- Cross-Domain Application: Methods generalizable to other fields requiring large-scale document processing
- Government Agencies: Various application approvals and document review processes
- Financial Institutions: Loan applications and compliance reviews
- Educational Institutions: Application material review and academic evaluation
- Corporate Organizations: Internal document review and supplier evaluation
The paper cites multiple important references, including:
- OpenAI GPT-4o system card (2024)
- EU Artificial Intelligence Act related documents
- Research on LLM applications across various domains
- Best practices research on human-machine collaboration and responsible AI deployment
Overall Assessment: This is an applied research paper with significant practical value. While relatively limited in technical innovation, its real-world deployment experience and comprehensive effectiveness evaluation provide valuable reference for AI applications in the public sector. The paper's honesty and practicality make it an important contribution to the field.