VayuChat: An LLM-Powered Conversational Interface for Air Quality Data Analytics
Acharya, Pisharodi, Mondal et al.
Air pollution causes about 1.6 million premature deaths each year in India, yet decision makers struggle to turn dispersed data into decisions. Existing tools require expertise and provide static dashboards, leaving key policy questions unresolved. We present VayuChat, a conversational system that answers natural language questions on air quality, meteorology, and policy programs, and responds with both executable Python code and interactive visualizations. VayuChat integrates data from Central Pollution Control Board (CPCB) monitoring stations, state-level demographics, and National Clean Air Programme (NCAP) funding records into a unified interface powered by large language models. Our live demonstration will show how users can perform complex environmental analytics through simple conversations, making data science accessible to policymakers, researchers, and citizens. The platform is publicly deployed at https://huggingface.co/spaces/SustainabilityLabIITGN/ VayuChat. For further information check out video uploaded on https://www.youtube.com/watch?v=d6rklL05cs4.
academic
VayuChat: An LLM-Powered Conversational Interface for Air Quality Data Analytics
Approximately 1.6 million premature deaths occur annually in India due to air pollution, yet policymakers struggle to transform dispersed data into actionable insights. Existing tools require specialized expertise and offer only static dashboards, failing to address critical policy questions. This paper presents VayuChat, a conversational system capable of answering natural language questions about air quality, meteorology, and policy initiatives while providing executable Python code and interactive visualizations. VayuChat integrates Central Pollution Control Board (CPCB) monitoring station data, state-level demographic data, and National Clean Air Program (NCAP) funding records through a unified interface powered by large language models. The platform enables policymakers, researchers, and citizens to conduct complex environmental analyses through simple conversation.
Severe Public Health Crisis: Air pollution in India causes 1.6 million premature deaths annually, with PM2.5 exposure reducing life expectancy by over 5 years
Data Utilization Barriers: Despite continuous collection of nationwide pollutant measurements by CPCB, converting raw data into timely policy-relevant insights remains challenging
High Technical Barriers: Existing tools require specialized knowledge, offer limited visualization capabilities, or address only narrow task scopes
Developed the first LLM-driven conversational system for air quality analysis: VayuChat processes natural language queries and generates executable Python code and visualization results
Integrated multi-source environmental data: Incorporates CPCB air quality and meteorological observations (2017-2024), state-level population and area data, and NCAP funding allocation records
Provided transparent code generation mechanisms: Reduces hallucinations by generating Python code rather than direct outputs, ensuring result verifiability and reproducibility
Supports multiple analysis types: Including direct queries, plot generation, correlation analysis, and policy impact assessment
Validated through practical case studies: Demonstrates system utility through in-depth analysis of Delhi's December 2024 air pollution crisis
The paper demonstrates VayuChat's practical application value through collaboration with air quality analysts investigating the causes of severe pollution surge in Delhi in December 2024.
Query: "Use time series plots to compare pollution levels and wind speed during Delhi's most polluted week in December 2024 with the 15 days before and after"
Key Findings:
Clear negative correlation between wind speed and PM2.5
PM2.5 exceeds 300 μg/m³ when wind speed drops below 1.0 m/s
Even modest wind speed decreases (0.6 m/s) can rapidly degrade air quality from "very poor" to "severe"
Query: "Analyze the correlation between CO, NO2, and PM2.5 in Delhi during December since 2017"
Correlation Matrix:
Pollutant
CO
NO2
PM2.5
CO
1
0.3
0.47
NO2
0.3
1
0.34
PM2.5
0.47
0.34
1
Insights: PM2.5 shows strongest correlation with CO (r=0.47), indicating that vehicular emissions, stubble burning, and industrial emissions from common sources drive synchronized pollution events.
Technical Feasibility: LLMs can effectively handle complex environmental data analysis queries, with code generation mechanisms ensuring result accuracy
Practical Value: System successfully supported in-depth analysis of Delhi's air pollution crisis, demonstrating real-world application potential
Improved Accessibility: Significantly reduces technical barriers to environmental data analysis, enabling non-technical users to conduct complex analyses
The paper cites 15 relevant references covering LLM foundational technologies, environmental data analysis tools, and health impacts of air pollution, providing sufficient theoretical foundation and comparative references.
Overall Assessment: This is an excellent paper combining technical innovation with practical application, pioneering in significance for LLM applications in environmental science. The system design is sound, case study analysis is thorough, and it holds important value for addressing environmental data utilization challenges in developing countries like India. While there is room for improvement in evaluation and technical details, the overall contribution is substantial with excellent prospects for promotion and application.