Kimia Afshari

ML Engineer

Software Engineer

Visual Perception

Spatial Video Grounding via Graph Transformers

  • Role: Researcher
  • Organization: University of California, Santa Barbara (UCSB)
  • Date: 2024
  • Focus: Vision-Language Models, Graph Transformers, Video QA

Built a video question-answering system capable of grounding textual queries in spatial locations within video frames.

The model integrates transformer architectures with graph neural networks to reason about video content. It identifies relevant visual evidence and highlights spatial regions related to the answer. This approach improves transparency and interpretability in video AI systems.