VUA 2024

The 2^nd Workshop on Video Understanding and its Applications (VUA 2024)

Video understanding is a popular field in computer vision and AI where we aim to learn/assess the world around us from video footage and can benefit many real-world applications, such as training and education, patient monitoring, sports assessment, and security systems. By automating these applications through video analysing, not only we can save money and time for their users, but also, we can decrease human errors. Despite the recent advances in the other areas of computer vision, e.g. image analysis, video understanding is still an unsolved problem and is considered a very challenging task.

The proposed workshop on video understanding aims to address the challenges in this field by making the following contributions:

Bringing together leading experts in the field of video understanding to help propel the field forward. This includes junior and senior researchers, with equal representation and contribution from academia and industry
The workshop also aims to stimulate and accelerate research progress in the field of video understanding to match the requirements of real-world applications by identifying the challenges and ways to address them.

Potential topics include, but are not limited to:

Application of Video Understanding to healthcare and media production
View-invariant and 3D video understanding (e.g. 3D action recognition)
Transformer for video understanding
Generating synthetic data for video understating tasks
Self-supervised learning for video understanding
Multi-modal video understanding
Active speaker detection
Action/event detection
Video captioning
Video editing and summarisation
Videography/virtual cinematography
Video search and retrieval

Submission

Papers will be limited to 9 pages according to the BMVC format (c.f. main conference authors guidelines). Papers will be published in BMVC 2024 workshop proceedings.

All the papers should be submitted using CMT website https://cmt3.research.microsoft.com/VUABMVC2024.

Important Dates

Extended deadline for submission: August 2^nd, 2024 - 23:59 British Summer Time
~~Original deadline: July 19^th, 2024~~
Notification of acceptance: August 23^rd, 2024
Camera Ready submission deadline: September 16^th, 2024
Workshop date: November 28^th, 2024

Program

10:00 - 10:05	Welcome and Introduction
10:05 - 10:55	Keynote 1 (online) by Prof. Cees G.M. Snoek Title: Learning to Generalize in Video Space and Time (40-minute talk followed by 10-minute Q&A)
10:55 - 11:45	Keynote 2 (online) by Dr. Antonino Furnari Title: Beyond Atomic Actions: Towards Long-Form and Procedural Understanding in Egocentric Videos (40-minute talk followed by 10-minute Q&A)
11:45 - 12:50	Keynote 3 by Dr. Laura Sevilla Title: Video Understanding with Limited Resources (40-minute talk followed by 10-minute Q&A)
12:50 - 13:30	Break & Lunch
13:30 - 13:45	AI4ME Presentation: How Video Understanding Can Help Media Production by Faegheh Sardari (15 minutes)
13:45 - 14:33	Oral Presentations 13:45 - 13:57 \| Davide Berghi Audio-visual talker localization in video for spatial sound reproduction (10-minute presentation, 2-minute Q&A) 13:57 - 14:09 \| Asmar Nadeem Video Description Generation with a particular focus on Causal-Temporal Narrative (10-minute presentation, 2-minute Q&A) 14:09 - 14:21 \| Yaru Chen Cross-Modal Perception for Interactive-Enhanced Audio-Visual Video Parsing (10-minute presentation, 2-minute Q&A) 14:21 - 14:33 \| Keyne Oei Self-supervised contrastive learning for videos using differentiable local alignment (10-minute presentation, 2-minute Q&A)
14:33 - 14:40	Closing the Workshop

Invited Speakers

Cees G.M. Snoek

University of Amsterdam, Netherlands

Learning to Generalize in Video Space and Time

Cees G.M. Snoek is a full professor in computer science at the University of Amsterdam, where he heads the Video & Image Sense Lab. He is the director of three public-private AI research labs: QUVA Lab with Qualcomm, Atlas Lab with TomTom and AIM Lab with Core42. He is also the director of the ELLIS Amsterdam Unit and scientific director of Amsterdam AI, a collaboration between government, academic, medical and other organisations in Amsterdam to develop and deploy responsible AI.

Antonino Furnari

University of Catania, Italy

Beyond atomic actions: towards long-form and procedural understanding in egocentric videos

Antonino Furnari is a tenure-track Assistant Professor at the University of Catania. His research interests lie in the field of Egocentric Vision, with particular interest in video understanding and building assistive wearable systems which can support and empower humans. He is an active member of the EPIC-KITCHENS, EGO4D, and EGO-EXO4D projects, a Senior Member of IEEE and an ELLIS member.

Laura Sevilla

University of Edinburgh, United Kingdom

Video Understanding with Limited Resources

Laura Sevilla is an Associate Professor at the University of Edinburgh, where she has been since 2019. There she leads her group that focuses on Video Understanding. Before, she was a researcher at Facebook Research in California and a postdoc at the Max Planck Institute in Germany. She obtained her PhD from the University of Massachusetts Amherst in 2015. During her career, she has worked in most aspects of Video Understanding, from Optical Flow to Object Tracking, Video Captioning and Perception for Robotics. Her work has been awarded a Google Research Scholar Award (2022), and a Google Faculty Award (2020).