GAIVA: A Generative AI Framework for Human-AI Collaboration in Educational Video Analysis

Description

Qualitative video analysis is a foundational yet labor-intensive method in the learning sciences, particularly for investigating the multimodal and collaborative dimensions of learning. This paper introduces GAIVA (Generative Artificial Intelligence for Video Analysis), a novel framework that integrates multimodal large language models (MLLMs) and large language models (LLMs) to support scalable, consistent, and theory-informed video analysis. GAIVA comprises four sequential phases: (1) video preprocessing, (2) MLLM-based behavioral transcription, (3) human-AI collaborative codebook development grounded in theoretical constructs from computer-supported collaborative learning (CSCL), and (4) LLM-based automated coding. Drawing on a case study to analyze simulation-based collaborative science learning, we demonstrate that GAIVA generates rich behavioral transcriptions, enables the construction of stable and theoretically grounded coding schemes, and achieves inter-rater reliability comparable to human annotation. Notably, findings illustrate GAIVA's capacity to support both deductive alignment and inductive discovery. The framework surfaces a behavioral construct--Peripheral Engagement--that is consistent across data batches but absent from existing frameworks. We contribute a systematic and replicable framework for integrating generative AI into qualitative video research, enabling large-scale, multimodal analysis of collaborative learning while preserving analytical rigor, theoretical coherence, and interpretive transparency.

Authors

DOI: 10.5281/zenodo.20748283

Publication Date: 2026-06-18

Back to publications list


About