In conjunction with BMVC 2023.
Aberdeen, United Kingdom
November 24th 2023
Video understanding is a popular field in computer vision and AI where we aim to learn/assess the world around us from video footage and can benefit many real-world applications, such as training and education, patient monitoring, sports assessment, and security systems. By automating these applications through video analysing, not only we can save money and time for their users, but also, we can decrease human errors. Despite the recent advances in the other areas of computer vision, e.g. image analysis, video understanding is still an unsolved problem and is considered a very challenging task.
The proposed workshop on video understanding aims to address the challenges in this field by making the following contributions:
Papers will be limited to 9 pages according to the BMVC format (c.f. main conference authors guidelines). Papers will be published in BMVC 2023 workshop proceedings.
All the papers should be submitted using CMT website https://cmt3.research.microsoft.com/VUABMVC2023.
Robert Gordon University, Sir Ian Wood Building, Garthdee Campus
10:00-10:05 - Welcome and Introduction
10:05-10:55 - Keynote 1 - Dr. Joao Carreira (Google Deepmind) (40 mins talk followed by 10 mins Q&A)
11:00-11:50 - Keynote 2 - Professor Dima Damen (University of Bristol) (40 mins talk followed by 10 mins Q&A)
11:50-12:10 - AI4ME Presentation - Faegheh Sardari & Asmar Nadeem (Video Understanding for personalized media production)
12:10-12:30 - Lunch break
12:30-14:00 - Oral Session (10-min presentations + 2-min Q&A)
1. End-to-end Amodal Video Instance Segmentation - Jasmin Breitenstein (Technische Universität Braunschweig)*; Kangdong Jin (Institute for Communications Technology, Technische Universität Braunschweig); Aziz Hakiri (Technische Universität Braunschweig); Marvin Klingner (Technische Universität Braunschweig ); Tim Fingscheidt ( Technische Universität Braunschweig)
2. Self-supervised animal detection in constrained environment - Fayaz Rahman (Cochin University of Science and Technology ); C B Dev Narayan (Cochin University ); Mohib Ullah (NTNU)*; Muhammad Mudassar Yamin (Norwegian University of Science and Technology); Øyvind Nordbø (Norsvin SA, 2317 Hamar, Norway.); Christopher Coello (Norsvin SA, 2317 Hamar, Norway.); Ali Shariq Imran (NTNU Gjøvik, Norway); Santhosh Kumar Gopalan (Cochin University of Science and Technology); Madhu S Nair (CUSAT); Faouzi Alaya-Chekh (NTNU at Gjøvik, Norway)
3. ZeST-NeRF: Using temporal aggregation for Zero-Shot Temporal NeRFs - Violeta Menéndez González (University of Surrey)*; Andrew Gilbert (University of Surrey); Graeme Phillipson (BBC); Stephen Jolly (BBC); Simon Hadfield (University of Surrey)
4. Pedestrian and Automatic Doors Abnormal Interactions Detection using Multi-Task Self-Supervised Learning - Olivier Laurendin (IRT Railenium)*; Sebastien Ambellouis (Gustave Eiffel University); Ankur V Mahtani (IRT Railenium); Anthony Fleury (IMT Lille Douai)
5. Actor and Context Attentions for Spatio-Temporal Action Localization - Manuel Sarmiento (Apple)*; David Varas (Apple); Elisenda Bou-Balust (Apple)
6. Centre Stage: Centricity-based Audio-Visual Temporal Action Detection - Hanyuan Wang (University of Bristol)*; Majid Mirmehdi (University of Bristol); Dima Damen (University of Bristol); Toby Perrett (University of Bristol)
14:00-14:50 - Keynote 3 - Dr. Fabian Caba (Adobe) (40 mins talk followed by 10 mins Q&A)
João is a senior research scientist at Google DeepMind, and prior to that, he was a postdoctoral researcher at the University of California, Berkeley. He is the first author of the paper 'Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset,' a groundbreaking work in the field of video understanding.
Dima is a full professor in computer vision at the University of Bristol, she is also a senior research scientist at Google DeepMind. Dima is currently an EPSRC Fellow (2020-2025), focusing her research interests in the automatic understanding of object interactions, actions and activities using wearable visual (and depth) sensors. She is the project lead for EPIC-KITCHENS, the largest dataset in egocentric vision, with accompanying open challenges.
Fabian is a Senior Research Scientist at Adobe working at the intersection of video understanding and generation. His main interests center around on the development of ML models aligned with creative human intent. He co-organized the ActivityNet and CVEU workshops during multiple editions.
We gratefully acknowledge our reviewers
For additional info please contact us here