The 1st Workshop in Video Understanding and its Applications

In conjunction with BMVC 2023.

Aberdeen, United Kingdom

November 24th 2023

The 1st Workshop in Video Understanding and its Applications (VUA 2023)

Video understanding is a popular field in computer vision and AI where we aim to learn/assess the world around us from video footage and can benefit many real-world applications, such as training and education, patient monitoring, sports assessment, and security systems. By automating these applications through video analysing, not only we can save money and time for their users, but also, we can decrease human errors. Despite the recent advances in the other areas of computer vision, e.g. image analysis, video understanding is still an unsolved problem and is considered a very challenging task.

The proposed workshop on video understanding aims to address the challenges in this field by making the following contributions:

  • Bringing together leading experts in the field of video understanding to help propel the field forward. This includes junior and senior researchers, with equal representation and contribution from academia and industry
  • The workshop also aims to stimulate and accelerate research progress in the field of video understanding to match the requirements of real-world applications by identifying the challenges and ways to address them.

  • Potential topics include, but are not limited to:
    • Application of Video Understanding to healthcare and media production
    • View-invariant and 3D video understanding (e.g. 3D action recognition)
    • Transformer for video understanding
    • Generating synthetic data for video understating tasks
    • Self-supervised learning for video understanding
    • Multi-modal video understanding
    • Action/event detection
    • Video captioning
    • Video editing and summarization
    • Videography/virtual cinematography
    • Video search and retrieval


    Papers will be limited to 9 pages according to the BMVC format (c.f. main conference authors guidelines). Papers will be published in BMVC 2023 workshop proceedings.

    All the papers should be submitted using CMT website

    Important Dates

    • Deadline for submission: August 20th, 2023 - 23:59 British Summer Time
    • Notification of acceptance: September 10th, 2023
    • Camera Ready submission deadline: September 25th, 2023
    • Workshop date: November 24th, 2023


    Robert Gordon University, Sir Ian Wood Building, Garthdee Campus

    10:00-10:05 - Welcome and Introduction

    10:05-10:55 - Keynote 1 - Dr. Joao Carreira (Google Deepmind) (40 mins talk followed by 10 mins Q&A)

    11:00-11:50 - Keynote 2 - Professor Dima Damen (University of Bristol) (40 mins talk followed by 10 mins Q&A)

    11:50-12:10 - AI4ME Presentation - Faegheh Sardari & Asmar Nadeem (Video Understanding for personalized media production)

    12:10-12:30 - Lunch break

    12:30-14:00 - Oral Session (10-min presentations + 2-min Q&A)

    1. End-to-end Amodal Video Instance Segmentation - Jasmin Breitenstein (Technische Universität Braunschweig)*; Kangdong Jin (Institute for Communications Technology, Technische Universität Braunschweig); Aziz Hakiri (Technische Universität Braunschweig); Marvin Klingner (Technische Universität Braunschweig ); Tim Fingscheidt ( Technische Universität Braunschweig)
    2. Self-supervised animal detection in constrained environment - Fayaz Rahman (Cochin University of Science and Technology ); C B Dev Narayan (Cochin University ); Mohib Ullah (NTNU)*; Muhammad Mudassar Yamin (Norwegian University of Science and Technology); Øyvind Nordbø (Norsvin SA, 2317 Hamar, Norway.); Christopher Coello (Norsvin SA, 2317 Hamar, Norway.); Ali Shariq Imran (NTNU Gjøvik, Norway); Santhosh Kumar Gopalan (Cochin University of Science and Technology); Madhu S Nair (CUSAT); Faouzi Alaya-Chekh (NTNU at Gjøvik, Norway)
    3. ZeST-NeRF: Using temporal aggregation for Zero-Shot Temporal NeRFs - Violeta Menéndez González (University of Surrey)*; Andrew Gilbert (University of Surrey); Graeme Phillipson (BBC); Stephen Jolly (BBC); Simon Hadfield (University of Surrey)
    4. Pedestrian and Automatic Doors Abnormal Interactions Detection using Multi-Task Self-Supervised Learning - Olivier Laurendin (IRT Railenium)*; Sebastien Ambellouis (Gustave Eiffel University); Ankur V Mahtani (IRT Railenium); Anthony Fleury (IMT Lille Douai)
    5. Actor and Context Attentions for Spatio-Temporal Action Localization - Manuel Sarmiento (Apple)*; David Varas (Apple); Elisenda Bou-Balust (Apple)
    6. Centre Stage: Centricity-based Audio-Visual Temporal Action Detection - Hanyuan Wang (University of Bristol)*; Majid Mirmehdi (University of Bristol); Dima Damen (University of Bristol); Toby Perrett (University of Bristol)

    14:00-14:50 - Keynote 3 - Dr. Fabian Caba (Adobe) (40 mins talk followed by 10 mins Q&A)

    Invited Speakers

    João is a senior research scientist at Google DeepMind, and prior to that, he was a postdoctoral researcher at the University of California, Berkeley. He is the first author of the paper 'Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset,' a groundbreaking work in the field of video understanding.

    Dima is a full professor in computer vision at the University of Bristol, she is also a senior research scientist at Google DeepMind. Dima is currently an EPSRC Fellow (2020-2025), focusing her research interests in the automatic understanding of object interactions, actions and activities using wearable visual (and depth) sensors. She is the project lead for EPIC-KITCHENS, the largest dataset in egocentric vision, with accompanying open challenges.

    Fabian is a Senior Research Scientist at Adobe working at the intersection of video understanding and generation. His main interests center around on the development of ML models aligned with creative human intent. He co-organized the ActivityNet and CVEU workshops during multiple editions.


    Faegheh Sardari

    University of Surrey, United Kingdom

    Armin Mustafa

    University of Surrey, United Kingdom

    Asmar Nadeem

    University of Surrey, United Kingdom

    Robert Dawes

    BBC R&D, United Kingdom

    Adrian Hilton

    University of Surrey, United Kingdom


    We gratefully acknowledge our reviewers

      Helge Rhodin-University of British Collombia, Mohammad Sabokrou-Institute for Research in Fundamental Sciences (IPM), Sauradip Nag-University of Surrey, Davide Moltisanti-University of Edinburgh, Ayushi Dutta-University of Surrey, Hanyuan Wang-University of Bristol, Ozge Mercanoglu Sincan-University of Surrey, Mohammad khalooei-Amirkabir University of Technology, Otto Brookes-University of Bristol, Xinyu Yang-Lancaster University


    For additional info please contact us here