CVPR 2026 Workshop | AI-assisted Long Video Creation

Workshop Overview

This workshop investigates how far current AI models and systems are from producing long-form, multi-shot videos that truly satisfy real users and creators. While recent generative models excel at short clips, long video creation introduces challenges in narrative structure, temporal consistency, multimodal alignment, and scalable human-in-the-loop editing.

A key motivation of this workshop is grounded in Bilibili's large-scale creator ecosystem, where millions of creators produce narrative-driven, multi-minute videos spanning animation, education, entertainment, and storytelling. Such real-world production settings expose gaps between current academic benchmarks and practical creator needs, particularly in maintaining cross-shot consistency, narrative coherence, and efficient human–AI collaboration over long temporal horizons.

Leveraging production-grade data, creator workflows, and user feedback from the Bilibili platform, this workshop aims to bridge academic research with real-world impact. By bringing together researchers, industry engineers, and content creators, we seek to define technical roadmaps, creator-centered evaluation protocols, and reproducible benchmarks that measure not only visual quality, but also narrative satisfaction, usability, and audience engagement in AI-assisted long video creation.

Call for Papers

We invite submissions that advance methods, datasets, systems, or evaluations for generating multi-shot, multi-minute videos that are coherent, controllable, and ethically responsible.

Topics of Interest

Narrative and temporal consistency in long video generation
Multimodal fusion and semantic alignment (video, audio, text, narration)
Controllability, editing, and creator interfaces
Knowledge-driven and genre-specific video creation
Human-centric evaluation and benchmarking
Human–AI collaboration and co-creation workflows

Submission Platform: All submissions will be handled via OpenReview.

OpenReview Submission Link

Paper Format: Submissions should follow the standard CVPR 2026 workshop paper format and are limited to 4 pages (excluding references).

Event	Date
Call for Papers Opens	December 22, 2025
Submission Deadline	March 1, 2026
Reviews Released	April 4, 2026
Camera-ready Deadline	April 11, 2026

Schedule (June 3rd, Room 712)

Time	Session
14:00 – 14:15	Opening & Motivation
14:15 – 15:00	Invited Talk 1 + Q&A Yaoyao Liu (UIUC) Enable Explicit 3D/4D Controls for Pre-trained Generative Models
15:00 – 15:15	Coffee Break
15:15 – 15:35	Invited Talk 2 + Q&A Ismini Lourentzou (UIUC) Long-Form Video Generation Needs World Models
15:35 – 15:55	Invited Talk 3 + Q&A Pinar Yanardag (Virginia Tech) Leveraging Hidden Priors for Training-Free Control: From Long-Horizon Video to Interactive World Models
15:55 – 16:10	Coffee Break
16:10 – 16:30	Oral Paper Presentations
16:30 – 17:00	Closing Remarks

Invited Speakers

The workshop will feature invited talks from leading researchers and practitioners in video generation, multimedia understanding, and AI-assisted creation.

Yaoyao Liu (University of Illinois Urbana-Champaign) – Confirmed
Assistant Professor in the School of Information Sciences and the Coordinated Science Laboratory at the University of Illinois Urbana-Champaign. He is also affiliated with the Siebel School of Computing and Data Science, the Department of Electrical and Computer Engineering, and the National Center for Supercomputing Applications. He received his Ph.D. in Computer Science from the Max Planck Institute for Informatics and his B.S. in Electronic Information Engineering from Tianjin University, and has also conducted research at Johns Hopkins University, the University of Oxford (VGG), and the National University of Singapore. His research focuses on the intersection of computer vision and machine learning, with an emphasis on building intelligent visual systems that are continual and data-efficient, including continual learning, few-shot learning, semi-supervised learning, generative models, 3D geometric modeling, and medical image analysis.
Dr. Pinar Yanardag (Virginia Tech, Department of Computer Science) – Confirmed
Tenure-track Assistant Professor at Virginia Tech's Department of Computer Science, where she leads GEMLAB and is a member of the Sanghani Center for AI and Discovery Analytics. Previously, she was a postdoctoral researcher at MIT and received her Ph.D. in Computer Science from Purdue University. Her research has appeared at leading venues including CVPR, ICCV, and NeurIPS, and has been featured by outlets such as The Washington Post, BBC, CNN, Motherboard, and Rolling Stone.
Ismini Lourentzou (UIUC, Siebel School of Computing and Data Science) – Confirmed
Ismini Lourentzou is an Assistant Professor in the School of Information Sciences at the University of Illinois Urbana-Champaign (UIUC), with affiliate appointments in the Siebel School of Computing and Data Science (CS) and the Electrical and Computer Engineering (ECE) Department. Her research interests are computer vision and multimodal machine learning, in particular 2D/3D generative modeling and vision-language models, centered on building AI agents that perceive, reason, and interact with the world. Lourentzou earned her Ph.D. in Computer Science from the University of Illinois Urbana-Champaign. Previously, she was an Assistant Professor of Computer Science at Virginia Tech, where she received the 2023 Virginia Tech College of Engineering Dean's Award for Excellence as an Outstanding New Assistant Professor.

Additional speakers may be announced.

AI-assisted Long Video Creation

Workshop Overview

Call for Papers

Topics of Interest

Important Dates

Schedule (June 3rd, Room 712)

Organizers

Invited Speakers

Broader Impact & Ethics