2025-11-18T18:43:13.867270

StreetLens: Enabling Human-Centered AI Agents for Neighborhood Assessment from Street View Imagery

Kim, Jang, Chiang et al.
Traditionally, neighborhood studies have used interviews, surveys, and manual image annotation guided by detailed protocols to identify environmental characteristics, including physical disorder, decay, street safety, and sociocultural symbols, and to examine their impact on developmental and health outcomes. Although these methods yield rich insights, they are time-consuming and require intensive expert intervention. Recent technological advances, including vision language models (VLMs), have begun to automate parts of this process; however, existing efforts are often ad hoc and lack adaptability across research designs and geographic contexts. In this paper, we present StreetLens, a user-configurable human-centered workflow that integrates relevant social science expertise into a VLM for scalable neighborhood environmental assessments. StreetLens mimics the process of trained human coders by focusing the analysis on questions derived from established interview protocols, retrieving relevant street view imagery (SVI), and generating a wide spectrum of semantic annotations from objective features (e.g., the number of cars) to subjective perceptions (e.g., the sense of disorder in an image). By enabling researchers to define the VLM's role through domain-informed prompting, StreetLens places domain knowledge at the core of the analysis process. It also supports the integration of prior survey data to enhance robustness and expand the range of characteristics assessed in diverse settings. StreetLens represents a shift toward flexible and agentic AI systems that work closely with researchers to accelerate and scale neighborhood studies. StreetLens is publicly available at https://knowledge-computing.github.io/projects/streetlens.
academic

StreetLens: ๊ฑฐ๋ฆฌ ๋ทฐ ์ด๋ฏธ์ง€๋ฅผ ํ†ตํ•œ ์ธ๊ฐ„ ์ค‘์‹ฌ AI ์—์ด์ „ํŠธ ๊ธฐ๋ฐ˜ ๊ทผ๋ฆฐ์ง€์—ญ ํ‰๊ฐ€

๊ธฐ๋ณธ ์ •๋ณด

  • ๋…ผ๋ฌธ ID: 2506.14670
  • ์ œ๋ชฉ: StreetLens: Enabling Human-Centered AI Agents for Neighborhood Assessment from Street View Imagery
  • ์ €์ž: Jina Kim, Leeje Jang, Yao-Yi Chiang, Guanyu Wang, Michelle C. Pasco (๋ฏธ๋„ค์†Œํƒ€ ๋Œ€ํ•™๊ต)
  • ๋ถ„๋ฅ˜: cs.HC (์ธ๊ฐ„-์ปดํ“จํ„ฐ ์ƒํ˜ธ์ž‘์šฉ), cs.AI (์ธ๊ณต์ง€๋Šฅ)
  • ๋ฐœํ‘œ ํ•™ํšŒ: The 1st ACM SIGSPATIAL International Workshop on Human-Centered Geospatial Computing (GeoHCC '25)
  • ๋…ผ๋ฌธ ๋งํฌ: https://arxiv.org/abs/2506.14670
  • ํ”„๋กœ์ ํŠธ ๋งํฌ: https://knowledge-computing.github.io/projects/streetlens

์ดˆ๋ก

์ „ํ†ต์ ์ธ ๊ทผ๋ฆฐ์ง€์—ญ ์—ฐ๊ตฌ๋Š” ์ธํ„ฐ๋ทฐ, ์„ค๋ฌธ์กฐ์‚ฌ, ์ƒ์„ธํ•œ ํ”„๋กœํ† ์ฝœ ๊ธฐ๋ฐ˜์˜ ์ˆ˜๋™ ์ด๋ฏธ์ง€ ์ฃผ์„์„ ํ†ตํ•ด ๋ฌผ๋ฆฌ์  ํ˜ผ๋ž€, ์‡ ํ‡ด, ๊ฑฐ๋ฆฌ ์•ˆ์ „์„ฑ, ์‚ฌํšŒ๋ฌธํ™”์  ์ƒ์ง•์„ ํฌํ•จํ•œ ํ™˜๊ฒฝ ํŠน์„ฑ์„ ํŒŒ์•…ํ•˜๊ณ , ์ด๋Ÿฌํ•œ ํŠน์„ฑ์ด ๋ฐœ์ „ ๋ฐ ๊ฑด๊ฐ• ๊ฒฐ๊ณผ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ์—ฐ๊ตฌํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•๋“ค์€ ํ’๋ถ€ํ•œ ํ†ต์ฐฐ๋ ฅ์„ ์ œ๊ณตํ•˜์ง€๋งŒ ์‹œ๊ฐ„์ด ๋งŽ์ด ๊ฑธ๋ฆฌ๊ณ  ์ „๋ฌธ๊ฐ€์˜ ์ง‘์•ฝ์ ์ธ ๊ฐœ์ž…์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์‚ฌ์šฉ์ž ๊ตฌ์„ฑ ๊ฐ€๋Šฅํ•œ ์ธ๊ฐ„ ์ค‘์‹ฌ ์›Œํฌํ”Œ๋กœ์šฐ์ธ StreetLens๋ฅผ ์ œ์•ˆํ•˜๋ฉฐ, ์ด๋Š” ๊ด€๋ จ ์‚ฌํšŒ๊ณผํ•™ ์ „๋ฌธ ์ง€์‹์„ ์‹œ๊ฐ ์–ธ์–ด ๋ชจ๋ธ(VLM)์— ํ†ตํ•ฉํ•˜์—ฌ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ๊ทผ๋ฆฐ์ง€์—ญ ํ™˜๊ฒฝ ํ‰๊ฐ€๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

์—ฐ๊ตฌ ๋ฐฐ๊ฒฝ ๋ฐ ๋™๊ธฐ

๋ฌธ์ œ ์ •์˜

๊ทผ๋ฆฐ์ง€์—ญ ํ™˜๊ฒฝ ํ‰๊ฐ€๋Š” ์ „ํ†ต์ ์œผ๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ณผ์ œ์— ์ง๋ฉดํ•ด ์žˆ์Šต๋‹ˆ๋‹ค:

  1. ๋…ธ๋™ ์ง‘์•ฝ์„ฑ: ์ฒด๊ณ„์  ์‚ฌํšŒ ๊ด€์ฐฐ(SSO)์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ํ›ˆ๋ จ๋œ ์ฝ”๋”๊ฐ€ ํ•„์š”ํ•˜๋ฉฐ, ์‹ ๋ขฐ์„ฑ์„ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ ์ฝ”๋”๊ฐ€ ๋™์ผ ์ด๋ฏธ์ง€์— ์ฃผ์„์„ ๋‹ฌ์•„์•ผ ํ•จ
  2. ํ™•์žฅ์„ฑ ์ œํ•œ: ์ˆ˜๋™ ๋ฐฉ๋ฒ•์€ ๊ด‘๋ฒ”์œ„ํ•œ ์ง€๋ฆฌ์  ์˜์—ญ ๋ฐ ๋‹ค์–‘ํ•œ ์—ฐ๊ตฌ ์ƒํ™ฉ์œผ๋กœ์˜ ํ™•์žฅ์ด ์–ด๋ ค์›€
  3. ์ „๋ฌธ๊ฐ€ ์˜์กด์„ฑ: ์˜์—ญ ์ „๋ฌธ๊ฐ€์˜ ์ง€์†์ ์ธ ์ฐธ์—ฌ ๋ฐ ๊ฐ๋… ํ•„์š”
  4. ํ‘œ์ค€ํ™”์˜ ์–ด๋ ค์›€: ์—ฐ๊ตฌ ์„ค๊ณ„ ๋ฐ ์ง€๋ฆฌ์  ๋ฐฐ๊ฒฝ ์ „๋ฐ˜์— ๊ฑธ์นœ ์ ์‘ํ˜• ์‹œ์Šคํ…œ ๋ฐฉ๋ฒ•์˜ ๋ถ€์žฌ

์—ฐ๊ตฌ์˜ ์ค‘์š”์„ฑ

๊ทผ๋ฆฐ์ง€์—ญ ํ™˜๊ฒฝ ํŠน์„ฑ ํ‰๊ฐ€๋Š” ํ™˜๊ฒฝ์ด ๋‹ค์Œ ์‚ฌํ•ญ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ์ดํ•ดํ•˜๋Š” ๋ฐ ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค:

  • ์ฒญ์†Œ๋…„ ๋ฐœ๋‹ฌ
  • ์ •์‹  ๊ฑด๊ฐ•
  • ์‚ฌํšŒ์  ๊ฒฐ์ง‘๋ ฅ
  • ๊ณต์ค‘๋ณด๊ฑด ๊ฒฐ๊ณผ

๊ธฐ์กด ๋ฐฉ๋ฒ•์˜ ํ•œ๊ณ„

  1. ์ „ํ†ต์  ๋ฐฉ๋ฒ•: ๊ฐ€์น˜ ์žˆ๋Š” ํ†ต์ฐฐ๋ ฅ์„ ์ œ๊ณตํ•˜์ง€๋งŒ ๊ณผ์ •์ด ๋ฒˆ๊ฑฐ๋กญ๊ณ  ์ „๋ฌธ๊ฐ€์— ์˜์กดํ•˜๋ฉฐ ๊ทœ๋ชจ ํ™•๋Œ€๊ฐ€ ์–ด๋ ค์›€
  2. ๊ธฐ์กด VLM ์‘์šฉ: ๋Œ€๋ถ€๋ถ„ ์ž„์‹œ์  ์‘์šฉ์ด๋ฉฐ ๊ตฌ์กฐํ™”๋œ ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ ๋ถ€์กฑํ•˜๊ณ , VLM์„ ์ธ๊ฐ„ ์ฝ”๋”์ฒ˜๋Ÿผ ์ž‘๋™ํ•˜๋„๋ก ์ฒด๊ณ„์ ์œผ๋กœ "ํ›ˆ๋ จ"ํ•  ์ˆ˜ ์—†์Œ
  3. ํ”ผ๋“œ๋ฐฑ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ๋ถ€์žฌ: ๊ธฐ์กด ๋ฐฉ๋ฒ•์€ ์ผ๋ฐ˜์ ์œผ๋กœ VLM ๊ฒฐ๊ณผ๋ฅผ ์ง์ ‘ ์ˆ˜์šฉํ•˜๋ฉฐ ์—ฐ๊ตฌ์ž ํ”ผ๋“œ๋ฐฑ์„ ์ œ๊ณตํ•˜์ง€ ์•Š์Œ

ํ•ต์‹ฌ ๊ธฐ์—ฌ

  1. StreetLens ์›Œํฌํ”Œ๋กœ์šฐ ์ œ์•ˆ: ์ธ๊ฐ„ ์ฝ”๋” ํ›ˆ๋ จ ๊ณผ์ •์„ ๋ชจ๋ฐฉํ•˜๋Š” ์ตœ์ดˆ์˜ ์—”๋“œ-ํˆฌ-์—”๋“œ, ์—ฐ๊ตฌ์ž ์ค‘์‹ฌ์˜ ์ฒด๊ณ„์  ์‚ฌํšŒ ๊ด€์ฐฐ ์›Œํฌํ”Œ๋กœ์šฐ
  2. ์ธ๊ฐ„-๊ธฐ๊ณ„ ํ˜‘๋ ฅ ํ”„๋ ˆ์ž„์›Œํฌ: ์—ญํ•  ํ”„๋กฌํ”„ํŒ…(role prompting)์„ ํ†ตํ•ด ์˜์—ญ ์ง€์‹์„ ๋ถ„์„ ๊ณผ์ •์˜ ํ•ต์‹ฌ ์š”์†Œ๋กœ ํ†ตํ•ฉ
  3. ์ž๋™ํ™”๋œ ํ”„๋กฌํ”„ํŠธ ํŠœ๋‹: ๊ด€๋ จ ์—ฐ๊ตฌ ๋ฌธํ—Œ ๋ฐ ์ฝ”๋”ฉ ๋งค๋‰ด์–ผ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์˜์—ญ ํŠน์ • ํ”„๋กฌํ”„ํŠธ ์ž๋™ ์ƒ์„ฑ
  4. ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ ๊ฐ•ํ™”: VLM ์˜์‚ฌ๊ฒฐ์ •์˜ ์„ค๋ช… ๋ฐ ํ”ผ๋“œ๋ฐฑ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ์ œ๊ณต
  5. ์˜คํ”ˆ์†Œ์Šค ์ ‘๊ทผ์„ฑ: Google Colab ๋…ธํŠธ๋ถ ์ œ๊ณต์œผ๋กœ ๊ธฐ์ˆ ์  ์ง„์ž… ์žฅ๋ฒฝ ๋‚ฎ์ถค

๋ฐฉ๋ฒ•๋ก  ์ƒ์„ธ ์„ค๋ช…

์ž‘์—… ์ •์˜

์ž…๋ ฅ:

  • ์—ฐ๊ตฌ ์ง€์—ญ ์‚ฌ์–‘
  • ์ฝ”๋”ฉ ๋งค๋‰ด์–ผ ๋ฐ ํ”„๋กœํ† ์ฝœ
  • ๊ด€๋ จ ํ•™์ˆ  ๋…ผ๋ฌธ
  • ์˜ˆ์‹œ ์ฃผ์„
  • ๊ฑฐ๋ฆฌ ๋ทฐ ์ด๋ฏธ์ง€(SVI)

์ถœ๋ ฅ:

  • ๊ตฌ์กฐํ™”๋œ ํ™˜๊ฒฝ ํŠน์„ฑ ํ‰๊ฐ€
  • ๊ฐ๊ด€์  ํŠน์„ฑ(์˜ˆ: ์ž๋™์ฐจ ์ˆ˜)์—์„œ ์ฃผ๊ด€์  ์ธ์‹(์˜ˆ: ํ˜ผ๋ž€๊ฐ)๊นŒ์ง€์˜ ์˜๋ฏธ๋ก ์  ์ฃผ์„
  • ํ‰๊ฐ€ ์„ค๋ช… ๋ฐ ํ”ผ๋“œ๋ฐฑ

์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜

StreetLens๋Š” ๋„ค ๊ฐ€์ง€ ํ•ต์‹ฌ ๋ชจ๋“ˆ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค:

M1. ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๊ธฐ(Data Processor)

  • ๊ธฐ๋Šฅ: ์ž…๋ ฅ ์ž๋ฃŒ ์ˆ˜์ง‘ ๋ฐ ์กฐ์ง
  • ์ž…๋ ฅ ์ฒ˜๋ฆฌ:
    • ์—ฐ๊ตฌ ์ง€์—ญ ์„ ํƒ(๋ฏธ๊ตญ ์ธ๊ตฌ์กฐ์‚ฌ TIGER ๋„๋กœ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜, 5๋ฏธํ„ฐ ๊ฐ„๊ฒฉ ์ƒ˜ํ”Œ๋ง)
    • ์ž๋ฃŒ ์—…๋กœ๋“œ(์ฝ”๋”ฉ ๋งค๋‰ด์–ผ, ํ”„๋กœํ† ์ฝœ, ๊ด€๋ จ ๋…ผ๋ฌธ, ์˜ˆ์‹œ ์ฃผ์„)
    • Google Street View ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰
  • ์ถœ๋ ฅ: ๊ตฌ์กฐํ™”๋œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ์„ธํŠธ

M2. ์ž๋™ํ™”๋œ ํ”„๋กฌํ”„ํŠธ ํŠœ๋‹(Automated Prompt Tuning)

  • ์—ญํ•  ์ƒ์„ฑ: ๊ด€๋ จ ๋…ผ๋ฌธ ์ดˆ๋ก์„ ๊ธฐ๋ฐ˜์œผ๋กœ VLM ์ „๋ฌธ๊ฐ€ ์—ญํ•  ์„ค๋ช… ์ƒ์„ฑ
    ํ”„๋กฌํ”„ํŠธ ํ…œํ”Œ๋ฆฟ:
    "You are an expert in the following fields and the author of the paper abstracts provided here: [๋…ผ๋ฌธ ์ดˆ๋ก]. Based on the expertise demonstrated, generate a general professional role description of yourself in one to two sentences, starting with 'You are' written in the second person."
    
  • ์ž‘์—… ๋ถ„๋ฅ˜: ์ฃผ๊ด€์  ์ธ์‹ ์ž‘์—… vs ๊ฐ๊ด€์  ๊ฒ€์ถœ ์ž‘์—… ๊ตฌ๋ถ„
    ๋ถ„๋ฅ˜ ํ”„๋กฌํ”„ํŠธ:
    "You are a classifier of annotation tasks... If it asks to rate/assess overall condition or quality, label as perception. If it asks to detect, count, or verify specific objects, label as object_detection."
    
  • ์ฝ”๋”ฉ ๋งค๋‰ด์–ผ ์ฒ˜๋ฆฌ: ์งˆ๋ฌธ-๋‹ต๋ณ€ ์Œ์„ ๊ตฌ์กฐํ™”๋œ ํ”„๋กฌํ”„ํŠธ๋กœ ๋ณ€ํ™˜

M3. ์‹œ๊ฐ ์–ธ์–ด ๋ชจ๋ธ ์ฒ˜๋ฆฌ๊ธฐ(VLM Processor)

  • ๋ชจ๋ธ ์„ ํƒ: ์˜คํ”ˆ์†Œ์Šค ๊ฒฝ๋Ÿ‰ VLM InternVL3-2B ์‚ฌ์šฉ
    • ์ด๋ฏธ์ง€ ์ธ์ฝ”๋”: InternViT-300M-448px-V2_5
    • ์–ธ์–ด ๋ชจ๋ธ: Qwen2.5-1.5B
  • ์ฒ˜๋ฆฌ ํ๋ฆ„:
    1. ์ด๋ฏธ์ง€ ์ธ์ฝ”๋”ฉ ๋ฐ ์ž„๋ฒ ๋”ฉ
    2. M2์—์„œ ์ƒ์„ฑ๋œ ํ”„๋กฌํ”„ํŠธ์™€ ๊ฒฐํ•ฉ
    3. ์˜ˆ์‹œ ์ด๋ฏธ์ง€-๋‹ต๋ณ€ ์Œ์„ ํ™œ์šฉํ•œ ์ปจํ…์ŠคํŠธ ํ•™์Šต
    4. ํ™˜๊ฒฝ ํŠน์„ฑ ํ‰๊ฐ€ ์ƒ์„ฑ

M4. ํ”ผ๋“œ๋ฐฑ ์ œ๊ณต๊ธฐ(Feedback Provider)

  • ์„ค๋ช… ์ƒ์„ฑ: VLM ํ‰๊ฐ€์— ๋Œ€ํ•œ ์ถ”๋ก  ์„ค๋ช… ์ œ๊ณต
  • ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ: ์—ฐ๊ตฌ์ž๊ฐ€ AI ์—์ด์ „ํŠธ์˜ ์˜์‚ฌ๊ฒฐ์ • ๊ณผ์ •์„ ์ดํ•ดํ•˜๋„๋ก ์ง€์›
  • ์˜ˆ์‹œ: '์‡ ํ‡ด 1' ์ธก์ •์— ๋Œ€ํ•œ ์„ค๋ช…: "There are only slight cracks, and any potholes present have been fixed or covered"

๊ธฐ์ˆ  ํ˜์‹  ํฌ์ธํŠธ

  1. ์˜์—ญ ์ง€์‹ ํ†ตํ•ฉ: ์—ญํ•  ํ”„๋กฌํ”„ํŒ…์„ ํ†ตํ•ด ์‚ฌํšŒ๊ณผํ•™ ์ „๋ฌธ ์ง€์‹์„ VLM์— ๋‚ด์žฅ
  2. ์ž‘์—… ์ž์ ์‘: ๋‹ค์–‘ํ•œ ํ‰๊ฐ€ ์ž‘์—… ์œ ํ˜•(์ธ์‹ vs ๊ฒ€์ถœ) ์ž๋™ ์‹๋ณ„ ๋ฐ ์ ์‘
  3. ์ปจํ…์ŠคํŠธ ํ•™์Šต: ์ „๋ฌธ๊ฐ€ ์ฃผ์„ ์˜ˆ์‹œ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ชจ๋ธ ์„ฑ๋Šฅ ํ–ฅ์ƒ
  4. ์ธ๊ฐ„-๊ธฐ๊ณ„ ํ˜‘๋ ฅ ์„ค๊ณ„: ์ธ๊ฐ„ ์ฝ”๋” ํ›ˆ๋ จ ๊ณผ์ • ๋ชจ๋ฐฉ, ๋ฌธํ—Œ ํ•™์Šต, ํ”„๋กœํ† ์ฝœ ์—ฐ๊ตฌ, ์˜ˆ์‹œ ๊ฒ€ํ†  ํฌํ•จ

์‚ฌ๋ก€ ์—ฐ๊ตฌ

์—ฐ๊ตฌ ๋ฐฐ๊ฒฝ

Pasco์™€ White (2020)์˜ ๊ฐ€์ • ์‚ฌํšŒ๊ณผํ•™ ์—ฐ๊ตฌ ๊ธฐ๋ฐ˜:

  • ์—ฐ๊ตฌ ๋ชฉํ‘œ: ๊ทผ๋ฆฐ์ง€์—ญ ํ™˜๊ฒฝ๊ณผ ์ฒญ์†Œ๋…„์˜ ์ธ์ข… ๋ผ๋ฒจ ์‚ฌ์šฉ ๊ฐ„์˜ ๊ด€๊ณ„ ํ‰๊ฐ€
  • ๋ฐฉ๋ฒ•: ์ฒด๊ณ„์  ์‚ฌํšŒ ๊ด€์ฐฐ(SSO) ํ”„๋กœํ† ์ฝœ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ธ๊ฐ„ ์ฝ”๋” ํ›ˆ๋ จ
  • ํ‰๊ฐ€ ๋‚ด์šฉ: ๋ฌผ๋ฆฌ์  ์‡ ํ‡ด ์ •๋„, ์‚ฌํšŒ๋ฌธํ™”์  ์ƒ์ง• ๋“ฑ
  • ๊ฒ€์ฆ ๋ฐฉ๋ฒ•: ๊ธ‰๋‚ด ์ƒ๊ด€๊ณ„์ˆ˜(ICC)๋ฅผ ํ†ตํ•ด ์ฝ”๋” ๊ฐ„ ์‹ ๋ขฐ์„ฑ ํ‰๊ฐ€

StreetLens ์‘์šฉ

  • ํ‰๊ฐ€ ๊ณผ์ •์— ์ถ”๊ฐ€ ์ง€๋Šฅํ˜• ์ฝ”๋”๋กœ ์ฐธ์—ฌ
  • ๊ด€๋ จ ์—ฐ๊ตฌ ๋ฌธํ—Œ์„ ์‚ฌ์šฉํ•˜์—ฌ VLM ์—ญํ•  ์ •์˜
  • ์ฝ”๋”ฉ ๋งค๋‰ด์–ผ์˜ ๊ตฌ์ฒด์  ์งˆ๋ฌธ ์ฒ˜๋ฆฌ(์˜ˆ: "ํ˜ผ๋ž€ 3")
  • ํ•ด์„ ๊ฐ€๋Šฅํ•œ ํ‰๊ฐ€ ๊ฒฐ๊ณผ ์ œ๊ณต

์‹คํ—˜ ์„ค์ •

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜

  • ๊ฑฐ๋ฆฌ ๋ทฐ ์ด๋ฏธ์ง€: Google Street View ์ด๋ฏธ์ง€
  • ์ง€๋ฆฌ ๋ฐ์ดํ„ฐ: ๋ฏธ๊ตญ ์ธ๊ตฌ์กฐ์‚ฌ TIGER ๋„๋กœ ๋ฐ์ดํ„ฐ
  • ์ƒ˜ํ”Œ๋ง ์ „๋žต: 5๋ฏธํ„ฐ ๊ฐ„๊ฒฉ ์‚ฌ์ „ ์ •์˜ ํฌ์ธํŠธ ์œ„์น˜
  • ์‚ฌ๋ก€ ๋ฐ์ดํ„ฐ: ์›๋ž˜ ์‚ฌ๋ก€ ์—ฐ๊ตฌ์˜ ์ˆ˜๋™ ์ฃผ์„ ๋ฐ์ดํ„ฐ

๊ธฐ์ˆ  ๊ตฌํ˜„

  • ๋ฐฐํฌ ํ”Œ๋žซํผ: Google Colab ๋…ธํŠธ๋ถ
  • ์„œ๋ฒ„: ๋ฏธ๋„ค์†Œํƒ€ ๋Œ€ํ•™๊ต, Cloudflare๋ฅผ ํ†ตํ•œ ์—ฐ๊ฒฐ
  • ์‚ฌ์šฉ์ž ์ธํ„ฐํŽ˜์ด์Šค: ๋ชจ๋“ˆ์‹ ๋ฒ„ํŠผ ์„ค๊ณ„, ๊ฐ ๋ชจ๋“ˆ ๊ธฐ๋Šฅ์˜ ๋…๋ฆฝ์  ํƒ์ƒ‰ ์ง€์›

๊ด€๋ จ ์—ฐ๊ตฌ

์ „ํ†ต์  ๋ฐฉ๋ฒ•์˜ ์ง„ํ™”

  1. ์ดˆ๊ธฐ ์—ฐ๊ตฌ: Sampson๊ณผ Raudenbush (1999)๊ฐ€ ๋น„๋””์˜ค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ์นด๊ณ  23,000๊ฐœ ๊ฑฐ๋ฆฌ ๊ตฌ๊ฐ„์˜ ๋ฌผ๋ฆฌ์  ํ˜ผ๋ž€ ํ‰๊ฐ€
  2. ๊ฐ€์ƒ ๊ฐ์‚ฌ: ํ›„์† ์—ฐ๊ตฌ์—์„œ Google Earth ๋ฐ Street View๋ฅผ ์‚ฌ์šฉํ•œ ์›๊ฒฉ ํ‰๊ฐ€ ์ฑ„ํƒ
  3. ์ปดํ“จํ„ฐ ๋น„์ „ ๋ฐฉ๋ฒ•: ๋„์‹œ ๋…นํ™”, ๋ณด๋„ ํ’ˆ์งˆ ๋“ฑ ๋ฌผ๋ฆฌ์  ํŠน์„ฑ ๊ฒ€์ถœ

VLM ์‘์šฉ ํ˜„ํ™ฉ

  • ๋ณดํ–‰์„ฑ ํ‰๊ฐ€: VLM์„ ์‚ฌ์šฉํ•˜์—ฌ ๋„์‹œ ๋ณดํ–‰ ์นœํ™”์„ฑ ํ‰๊ฐ€
  • ๊ตฌ์กฐํ™”๋œ ์„ค๋ช…: ๋„์‹œ ํ™˜๊ฒฝ์˜ ๊ตฌ์กฐํ™”๋œ ์„ค๋ช… ์ƒ์„ฑ
  • ๊ฐ์ฒด ๊ฒ€์ถœ: ๊ฐ์‚ฌ ๋ฒ”์ฃผ์—์„œ ํŠน์ • ๊ฐ์ฒด ๊ฒ€์ถœ

StreetLens์˜ ์žฅ์ 

๊ธฐ์กด ์—ฐ๊ตฌ์™€ ๋น„๊ตํ•˜์—ฌ StreetLens๋Š” ๋‹ค์Œ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค:

  • ์—”๋“œ-ํˆฌ-์—”๋“œ ์—ฐ๊ตฌ์ž ์ค‘์‹ฌ ์›Œํฌํ”Œ๋กœ์šฐ
  • ์ธ๊ฐ„ ์ฝ”๋” ํ›ˆ๋ จ ๊ณผ์ • ๋ชจ๋ฐฉ์˜ ์ฒด๊ณ„์  VLM ํ›ˆ๋ จ
  • ์—ฐ๊ตฌ ์„ค๊ณ„ ๋ฐ ์ง€๋ฆฌ์  ๋ฐฐ๊ฒฝ ์ „๋ฐ˜์— ๊ฑธ์นœ ์ ์‘์„ฑ

๊ฒฐ๋ก  ๋ฐ ๋…ผ์˜

์ฃผ์š” ๊ฒฐ๋ก 

  1. ์›Œํฌํ”Œ๋กœ์šฐ ํšจ๊ณผ์„ฑ: StreetLens๋Š” ์ธ๊ฐ„ ์ฝ”๋”์˜ ํ›ˆ๋ จ ๋ฐ ํ‰๊ฐ€ ๊ณผ์ •์„ ์„ฑ๊ณต์ ์œผ๋กœ ๋ชจ๋ฐฉ
  2. ์˜์—ญ ์ง€์‹ ํ†ตํ•ฉ: ์—ญํ•  ํ”„๋กฌํ”„ํŒ…์„ ํ†ตํ•ด ์‚ฌํšŒ๊ณผํ•™ ์ „๋ฌธ ์ง€์‹์„ ํšจ๊ณผ์ ์œผ๋กœ ํ†ตํ•ฉ
  3. ํ™•์žฅ์„ฑ ํ–ฅ์ƒ: ๊ทผ๋ฆฐ์ง€์—ญ ํ™˜๊ฒฝ ํ‰๊ฐ€์˜ ๊ทœ๋ชจ ํ™•๋Œ€ ๋Šฅ๋ ฅ ํ˜„์ €ํžˆ ๊ฐœ์„ 
  4. ์ธ๊ฐ„-๊ธฐ๊ณ„ ํ˜‘๋ ฅ: AI์™€ ์—ฐ๊ตฌ์ž ๊ฐ„์˜ ํšจ๊ณผ์  ํ˜‘๋ ฅ ์‹คํ˜„

ํ•œ๊ณ„

  1. ๋ชจ๋ธ ํŽธํ–ฅ: VLM์ด ๋‹ค์–‘ํ•œ ๊ทผ๋ฆฐ์ง€์—ญ์˜ ์‚ฌํšŒ๋ฌธํ™”์  ๋ฐฐ๊ฒฝ ํ•ด์„ ์‹œ ํŽธํ–ฅ์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์Œ
  2. ํ‰๊ฐ€ ๊ฒ€์ฆ: ์ž๋™ํ™”๋œ ์ฝ”๋”ฉ์˜ ์‹ ๋ขฐ์„ฑ์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด ๋” ์ฒด๊ณ„์ ์ธ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•(์˜ˆ: ICC) ํ•„์š”
  3. ํ”ผ๋“œ๋ฐฑ ๋ฉ”์ปค๋‹ˆ์ฆ˜: ํ˜„์žฌ ํ”ผ๋“œ๋ฐฑ ๋ฃจํ”„๊ฐ€ ์ œํ•œ์ ์ด๋ฉฐ ๋” ๋งŽ์€ ์ƒํ˜ธ์ž‘์šฉ์‹ ๊ฐœ์„  ๊ธฐ๋Šฅ ํ•„์š”

ํ–ฅํ›„ ๋ฐฉํ–ฅ

  1. ์ธ๊ฐ„-๊ธฐ๊ณ„ ์ƒํ˜ธ์ž‘์šฉ ๊ฐ•ํ™”:
    • ์—ฐ๊ตฌ์ž๊ฐ€ StreetLens ์˜์‚ฌ๊ฒฐ์ •์„ ์„ค๋ช…ํ•˜๊ณ  ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ํ”ผ๋“œ๋ฐฑ ๋ฃจํ”„ ์ถ”๊ฐ€
    • ๋‹ค์–‘ํ•œ ์œ ํ˜•์˜ ์ž๋™ํ™” ์ฝ”๋” ํƒ์ƒ‰
    • ์ธ๊ฐ„ ์ฝ”๋”ฉ์— ๋” ๊ฐ€๊นŒ์šด ์ž๋™ํ™” ๋ฐฉ๋ฒ• ๊ฐœ๋ฐœ
  2. ํ‰๊ฐ€ ๋ฐฉ๋ฒ• ๊ฐœ์„ :
    • ๊ธ‰๋‚ด ์ƒ๊ด€๊ณ„์ˆ˜(ICC)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž๋™ํ™” ์ฝ”๋”๋ฅผ ์ธ๊ฐ„ ์ฃผ์„์ž ์ค‘ ํ•˜๋‚˜๋กœ ์ทจ๊ธ‰
    • ์ถœ๋ ฅ์˜ ํ•ฉ๋ฆฌ์„ฑ ๋ฐ ์‹ ๋ขฐ์„ฑ์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๋Š” ํ”ผ๋“œ๋ฐฑ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ์ œ๊ณต
    • ๊ฒฐ๊ณผ ๊ฒ€ํ†  ๋ฐ ๊ฐœ์„ ์˜ ํŽธ์˜์„ฑ ๊ฐ•ํ™”
  3. ํŽธํ–ฅ ์™„ํ™”:
    • ์ž ์žฌ์  ํŽธํ–ฅ ์ถœ์ฒ˜ ํ‰๊ฐ€
    • ์˜์—ญ ์ „๋ฌธ๊ฐ€์™€์˜ ํ˜‘๋ ฅ์„ ์œ„ํ•ด ์ฐธ์—ฌํ˜• ์„ค๊ณ„ ๋ฐฉ๋ฒ• ์ ์šฉ
    • ๋„๊ตฌ์˜ ์ฑ…์ž„๊ฐ ์žˆ๊ณ  ์ธ๊ฐ„ ์ค‘์‹ฌ์  ํŠน์„ฑ ๋ณด์žฅ

์‹ฌ์ธต ํ‰๊ฐ€

์žฅ์ 

  1. ๋†’์€ ํ˜์‹ ์„ฑ: ์ธ๊ฐ„ ์ฝ”๋” ํ›ˆ๋ จ ๊ณผ์ •์„ ์ฒด๊ณ„์ ์œผ๋กœ ๋ชจ๋ฐฉํ•˜๋Š” VLM ์›Œํฌํ”Œ๋กœ์šฐ ์ตœ์ดˆ ์ œ์•ˆ
  2. ๋†’์€ ์‹ค์šฉ ๊ฐ€์น˜: ๊ทผ๋ฆฐ์ง€์—ญ ์—ฐ๊ตฌ์˜ ์‹ค์ œ ๋ฌธ์ œ์  ํ•ด๊ฒฐ, ๊ด‘๋ฒ”์œ„ํ•œ ์‘์šฉ ์ „๋ง ๋ณด์œ 
  3. ํ•ฉ๋ฆฌ์  ๊ธฐ์ˆ  ๋ฐฉ์•ˆ: 4๊ฐœ ๋ชจ๋“ˆ ์„ค๊ณ„๊ฐ€ ๋ช…ํ™•ํ•˜๊ณ  ๊ธฐ์ˆ  ๊ฒฝ๋กœ๊ฐ€ ์‹คํ–‰ ๊ฐ€๋Šฅ
  4. ์˜คํ”ˆ์†Œ์Šค ์นœํ™”์ : Google Colab ๊ตฌํ˜„ ์ œ๊ณต์œผ๋กœ ์‚ฌ์šฉ ์ง„์ž… ์žฅ๋ฒฝ ๋‚ฎ์ถค
  5. ํ•™์ œ ๊ฐ„ ํ†ตํ•ฉ: AI ๊ธฐ์ˆ ๊ณผ ์‚ฌํšŒ๊ณผํ•™ ๋ฐฉ๋ฒ•๋ก ์„ ํšจ๊ณผ์ ์œผ๋กœ ๊ฒฐํ•ฉ

๋ถ€์กฑํ•œ ์ 

  1. ํ‰๊ฐ€ ๋ถˆ์ถฉ๋ถ„: ์ธ๊ฐ„ ์ฝ”๋”์™€์˜ ์ฒด๊ณ„์  ๋น„๊ต ์‹คํ—˜ ๋ถ€์žฌ
  2. ํŽธํ–ฅ ์œ„ํ—˜: VLM์˜ ์‚ฌํšŒ๋ฌธํ™”์  ํ•ด์„ ํŽธํ–ฅ ๋ฌธ์ œ์— ๋Œ€ํ•œ ๋…ผ์˜ ๋ถ€์กฑ
  3. ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ ๋ฏธ๊ฒ€์ฆ: ๋‹จ์ผ ์‚ฌ๋ก€ ์—ฐ๊ตฌ๋งŒ ๊ธฐ๋ฐ˜ํ•˜๋ฉฐ ๋‹ค์ค‘ ์‹œ๋‚˜๋ฆฌ์˜ค ๊ฒ€์ฆ ๋ถ€์žฌ
  4. ๊ธฐ์ˆ  ์„ธ๋ถ€์‚ฌํ•ญ ๋ถ€์กฑ: ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง์˜ ๊ตฌ์ฒด์  ์ „๋žต ๋ฐ ํšจ๊ณผ ๋ถ„์„ ์ œํ•œ์ 

์˜ํ–ฅ๋ ฅ

  1. ํ•™์ˆ  ๊ธฐ์—ฌ: ์ธ๊ฐ„-๊ธฐ๊ณ„ ํ˜‘๋ ฅ์˜ ์ง€๋ฆฌ๊ณต๊ฐ„ ์ปดํ“จํŒ…์— ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„ ์ œ๊ณต
  2. ์‹ค๋ฌด ๊ฐ€์น˜: ๊ทผ๋ฆฐ์ง€์—ญ ์—ฐ๊ตฌ์˜ ํšจ์œจ์„ฑ ๋ฐ ๊ทœ๋ชจ๋ฅผ ํ˜„์ €ํžˆ ํ–ฅ์ƒ ๊ฐ€๋Šฅ
  3. ํ•™์ œ ๊ฐ„ ์˜ํ–ฅ: ๋„์‹œ ๊ณ„ํš, ๊ณต์ค‘๋ณด๊ฑด, ์‚ฌํšŒํ•™ ๋“ฑ ๋ถ„์•ผ์— ์‘์šฉ ๊ฐ€์น˜ ๋ณด์œ 
  4. ๋ฐฉ๋ฒ•๋ก  ํ˜์‹ : VLM์˜ ์˜์—ญ ํŠน์ • ์ž‘์—… ์‘์šฉ์„ ์œ„ํ•œ ์ฐธ๊ณ  ํ”„๋ ˆ์ž„์›Œํฌ ์ œ๊ณต

์ ์šฉ ์‹œ๋‚˜๋ฆฌ์˜ค

  1. ๋„์‹œ ์—ฐ๊ตฌ: ๋Œ€๊ทœ๋ชจ ๊ทผ๋ฆฐ์ง€์—ญ ํ™˜๊ฒฝ ํŠน์„ฑ ํ‰๊ฐ€
  2. ๊ณต์ค‘๋ณด๊ฑด: ํ™˜๊ฒฝ ์š”์ธ์ด ๊ฑด๊ฐ•์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ ์—ฐ๊ตฌ
  3. ์‚ฌํšŒํ•™ ์—ฐ๊ตฌ: ์ง€์—ญ์‚ฌํšŒ ํŠน์„ฑ๊ณผ ์‚ฌํšŒ ํ˜„์ƒ ๊ด€๊ณ„ ๋ถ„์„
  4. ๋„์‹œ ๊ณ„ํš: ์‹œ๊ฐ์  ํŠน์„ฑ ๊ธฐ๋ฐ˜ ๋„์‹œ ํ™˜๊ฒฝ ํ‰๊ฐ€

์œค๋ฆฌ์  ๊ณ ๋ ค์‚ฌํ•ญ

๋…ผ๋ฌธ์€ ๊ธฐ๊ณ„ํ•™์Šต ๋ชจ๋ธ์ด ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋Š” ์‚ฌํšŒ์  ํŽธํ–ฅ ๋ฌธ์ œ, ํŠนํžˆ ๋‹ค์–‘ํ•œ ๊ทผ๋ฆฐ์ง€์—ญ์˜ ์‚ฌํšŒ๋ฌธํ™”์  ๋ฐฐ๊ฒฝ ํ•ด์„ ์‹œ ํŽธํ–ฅ์„ ๋ช…์‹œ์ ์œผ๋กœ ์ธ์ •ํ•ฉ๋‹ˆ๋‹ค. ์ €์ž๋“ค์€ ํ–ฅํ›„ ์—ฐ๊ตฌ์—์„œ ์ž ์žฌ์  ํŽธํ–ฅ ์ถœ์ฒ˜๋ฅผ ํ‰๊ฐ€ํ•˜๊ณ  ์˜์—ญ ์ „๋ฌธ๊ฐ€์™€ ํ˜‘๋ ฅํ•˜์—ฌ ์ฐธ์—ฌํ˜• ์„ค๊ณ„ ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•˜๋ฉฐ, StreetLens๊ฐ€ ์ฑ…์ž„๊ฐ ์žˆ๋Š” ์ธ๊ฐ„ ์ค‘์‹ฌ ๋„๊ตฌ๋กœ ๊ธฐ๋Šฅํ•˜๋„๋ก ํ•  ๊ณ„ํš์ž…๋‹ˆ๋‹ค.

์ฐธ๊ณ ๋ฌธํ—Œ

๋…ผ๋ฌธ์€ ๋‹ค์Œ์„ ํฌํ•จํ•œ ๊ด€๋ จ ๋ถ„์•ผ์˜ ์ค‘์š”ํ•œ ์—ฐ๊ตฌ๋ฅผ ์ธ์šฉํ•ฉ๋‹ˆ๋‹ค:

  • ๊ทผ๋ฆฐ์ง€์—ญ ํ™˜๊ฒฝ ํ‰๊ฐ€์˜ ๊ณ ์ „ ์—ฐ๊ตฌ(Sampson & Raudenbush, 1999)
  • ๊ฐ€์ƒ ๊ฐ์‚ฌ ๋ฐฉ๋ฒ•์˜ ๋ฐœ์ „(Odgers et al., 2012; Clarke et al., 2010)
  • ๋„์‹œ ๋ถ„์„์—์„œ์˜ VLM ์‘์šฉ(Biljecki & Ito, 2021)
  • ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง ๊ธฐ์ˆ (Schulhoff et al., 2025)

์š”์•ฝ: StreetLens๋Š” AI์™€ ์‚ฌํšŒ๊ณผํ•™ ์—ฐ๊ตฌ ๋ฐฉ๋ฒ•๋ก  ์œตํ•ฉ์˜ ์ค‘์š”ํ•œ ์ง„์ „์„ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ์ฒด๊ณ„์  ์›Œํฌํ”Œ๋กœ์šฐ ์„ค๊ณ„๋ฅผ ํ†ตํ•ด ๊ทผ๋ฆฐ์ง€์—ญ ํ™˜๊ฒฝ ํ‰๊ฐ€์˜ ์ž๋™ํ™” ๋ฐ ๊ทœ๋ชจ ํ™•๋Œ€๋ฅผ ์‹คํ˜„ํ•ฉ๋‹ˆ๋‹ค. ํ‰๊ฐ€ ๊ฒ€์ฆ ๋ฐ ํŽธํ–ฅ ์ฒ˜๋ฆฌ ์ธก๋ฉด์—์„œ ์ถ”๊ฐ€ ๊ฐœ์„ ์ด ํ•„์š”ํ•˜์ง€๋งŒ, ํ˜์‹ ์ ์ธ ์ธ๊ฐ„-๊ธฐ๊ณ„ ํ˜‘๋ ฅ ๊ฐœ๋…๊ณผ ์‹ค์šฉ์  ๊ธฐ์ˆ  ๋ฐฉ์•ˆ์€ ๊ด€๋ จ ๋ถ„์•ผ ์—ฐ๊ตฌ์— ๊ฐ€์น˜ ์žˆ๋Š” ๋„๊ตฌ ๋ฐ ๋ฐฉ๋ฒ•๋ก  ์ฐธ๊ณ ์ž๋ฃŒ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.