StreetLens: Enabling Human-Centered AI Agents for Neighborhood Assessment from Street View Imagery
Kim, Jang, Chiang et al.
Traditionally, neighborhood studies have used interviews, surveys, and manual image annotation guided by detailed protocols to identify environmental characteristics, including physical disorder, decay, street safety, and sociocultural symbols, and to examine their impact on developmental and health outcomes. Although these methods yield rich insights, they are time-consuming and require intensive expert intervention. Recent technological advances, including vision language models (VLMs), have begun to automate parts of this process; however, existing efforts are often ad hoc and lack adaptability across research designs and geographic contexts. In this paper, we present StreetLens, a user-configurable human-centered workflow that integrates relevant social science expertise into a VLM for scalable neighborhood environmental assessments. StreetLens mimics the process of trained human coders by focusing the analysis on questions derived from established interview protocols, retrieving relevant street view imagery (SVI), and generating a wide spectrum of semantic annotations from objective features (e.g., the number of cars) to subjective perceptions (e.g., the sense of disorder in an image). By enabling researchers to define the VLM's role through domain-informed prompting, StreetLens places domain knowledge at the core of the analysis process. It also supports the integration of prior survey data to enhance robustness and expand the range of characteristics assessed in diverse settings. StreetLens represents a shift toward flexible and agentic AI systems that work closely with researchers to accelerate and scale neighborhood studies. StreetLens is publicly available at https://knowledge-computing.github.io/projects/streetlens.
academic
StreetLens: Enabling Human-Centered AI Agents for Neighborhood Assessment from Street View Imagery
Traditional neighborhood research relies on interviews, surveys, and manual image annotation based on detailed protocols to identify environmental characteristics, including physical disorder, decay, street safety, and sociocultural symbols, and to investigate their impacts on development and health outcomes. While these methods generate rich insights, they are time-consuming and require intensive expert intervention. This paper proposes StreetLens, a user-configurable human-centered workflow that integrates relevant social science expertise into vision-language models (VLMs) for scalable neighborhood environmental assessment.
Proposed StreetLens Workflow: The first end-to-end, researcher-centered systematic social observation workflow that simulates human coder training processes
Human-Machine Collaboration Framework: Incorporates domain knowledge as a core component of the analysis process through role prompting
Automated Prompt Tuning: Automatically generates domain-specific prompts based on relevant research literature and coding manuals
Enhanced Interpretability: Provides explanations of VLM decisions and feedback mechanisms
Open-Source Accessibility: Provides Google Colab notebooks to lower technical barriers
Role Generation: Generates VLM professional role descriptions based on relevant paper abstracts
Prompt Template:
"You are an expert in the following fields and the author of the paper abstracts provided here: [paper abstracts]. Based on the expertise demonstrated, generate a general professional role description of yourself in one to two sentences, starting with 'You are' written in the second person."
Task Classification: Distinguishes between subjective perception tasks vs. objective detection tasks
Classification Prompt:
"You are a classifier of annotation tasks... If it asks to rate/assess overall condition or quality, label as perception. If it asks to detect, count, or verify specific objects, label as object_detection."
Coding Manual Processing: Converts question-answer pairs into structured prompts
The paper explicitly acknowledges potential social bias in machine learning models, particularly when interpreting sociocultural contexts in diverse neighborhoods. The authors plan to evaluate potential bias sources in future work and collaborate with domain experts using participatory design methods to ensure StreetLens functions as a responsible, human-centered tool.
The paper cites important works in relevant fields, including:
Classical research on neighborhood environmental assessment (Sampson & Raudenbush, 1999)
Development of virtual audit methods (Odgers et al., 2012; Clarke et al., 2010)
VLM applications in urban analysis (Biljecki & Ito, 2021)
Prompt engineering techniques (Schulhoff et al., 2025)
Summary: StreetLens represents an important advancement in the integration of AI with social science research methodology, achieving automation and scalability of neighborhood environmental assessment through systematic workflow design. While further refinement is needed in assessment validation and bias handling, its innovative human-machine collaboration concept and practical technical solution provide valuable tools and methodological references for related research fields.