2-e1675131043559.png
To Capture the Highlighting Moments in Video — Video Moment Retrieval and Enabling Datasets
June 24, 2024Updated 3:09 am

In recent years, with the development of communication and media technologies, as well as mobile phones and camera-enabled devices which become more common and user-friendly, enormous videos are being shot and uploaded all over the Internet.

Accordingly, Every minute, 500h of footage is uploaded to Youtube.com, and∼1900h of footage is live-streamed on Twitch.tv. [1]

Videos are often used to communicate ideas, concepts, experiences and situations. 

So, the increased amount of video content can be challenging for viewers to find the content they are most likely to enjoy. Video Moment Retrieval (VMR) or Temporal Sentence Grounding in Videos(TSGV) is of great interest in academic and related application scenarios.

This article will cover the following topics.

  1. What is VMR
  2. Application of VMR for Video Highlights
  3. Applications of VMR for Video Creation
  4. VMR: More Applications
  5. Related open and commercial datasets

1. What is VMR

VMR or TSGV’s definition is: Given a temporally untrimmed video and a natural language query, the goal is to determine the start and end times for the described activity inside the video.[2]

To find out more details of the background, definition and comparison with other related tasks, Please read:

 Repost | Video Moment Retrieval to Capture the Highlighting Moments in Video (Part.1)

2. Application of VMR for Video Highlights

VMR is widely used in the entertainment and media industries, such as highlighting moments in Film & TV Drama, broadcasts such as sports, news, and e-sports, as well as the social media and live-streaming industries.

In addition, VMR plays a crucial role in highlighting personal electronic photo albums.

Like the application scenarios above, VMR is used in a wide range of practical applications.

Therefore, the definition of video highlights is quite difficult to define, as it has different meanings in different scenarios.

2.1 Content Video Platforms

Highlight moments in videos can entertain users who have not watched a broadcast or who simply want to view those exciting moments again. In this case, the task of VMR is to find a specific highlight moment from a long video.

For instance, content video platforms such as YouTube and Netflix automatically play some clips when the mouse is moved over the video, and these clips can be considered highlight-moment clips.

Even on some video sites, viewers can choose to watch only the video clip of a particular actor or actress when watching serial dramas.

In sports and e-sports (online video games) fields, video highlight detection is also a popular topic. Using VMR to detect highlight moments can be quite helpful as it provides precise details of the action that took place at a particular time.

Furthermore, in the News video industry, watching a specific TV channel’s complete news headlines and bulletins is very time-consuming. By using VMR, people can watch the news of their interest by simply typing in a few keywords.

Photo by Maxim Hopman on Unsplash

2.2  The Livestreaming Industry

In the livestreaming industry, mainstream live-streaming platforms often provide candidates for highlight clips. For E-commerce livestreaming, the highlight clip may be the time period with the highest GMV or the highest number of viewers or gifts.

Photo by Jametlene Reskp on Unsplash

2.3 Other Use Cases 

Not only for social media and livstreaming fields, but VMR is also becoming a part of our day-to-day life. 

For instance, people use online personal electronic photo albums. such as the For You tab on the iPhone, which searches through your photos and videos to find moments that stand out and then presents them in collections called Memories. [2]

 

 

For marketing and advertising fields, content video platforms want to quickly tag each new video and push it to the right users, editors want to easily and quickly retrieve relevant video clips as news material or extract highlights of game clips from game videos; advertising or recommendation platforms want to generate nicer covers for videos to improve conversion rates.

Not to mention security management departments that need to accurately conduct video content audits such as real-time identification of violations. VMR has become an integral part of our lives.

Moreover, there are some AI companies that provide automated highlight editing products, such as shrynk, Filmora Video Editor, etc.

3. Applications of VMR for Video Creation

Based on video highlight detection, people are able to create their own videos by using VMR technology.

Major companies have already entered this field.

Make-A-Video from Meta is a tool that allows people to quickly and easily create a new content video with just a few words, lines of text, images or existing videos.

After less than a week, Google announced a text-to-video AI mode capable of producing a video from a written prompt, called Imagen Video

Gallery from Imagen Video

Phenaki is also a text-to-video AI  model that can create longer videos from detailed prompts. That, along with DreamFusion, which can create 3D models from text prompts, shows that competitive development on diffusion models continues rapidly. [3]

Thanks to the development of VMR, other text-to-video AI video generators are on the go, such as Synthesia, Rephrase.ai and InVideo, making content video creation much easier and more convenient.

4. VMR: More Applications

Content video and social media platforms are now the mainstream way for people to communicate with each other and the world. 

However, there are still people who are unable to watch the video due to disability or network reasons. So automatically telling the stories through multi-sentence descriptions of videos would allow bridging this gap. [4]

This is where one of VMR’s features, video storytelling, comes in handy. One goal of automatically comprehending and describing such videos using natural language is to provide multi-sentence narrative descriptions, making them accessible

Moreover, this function is also quite significant for AR/VR and AI assistants

For instance, the image below shows people using an AI assistant with a mobile camera on one object (i.e. the watch on the wrist)  and asking how to adjust the watch.

 

The AI assistant then shows a video with step-by-step instruction.

So we have reason to believe that VMR will provide considerable help in AR/VR and AI assistants in the future.

For applications of VMR/TSGV, there are more ways to classify. Therefore, maadaa.ai also classified the examples above into four main approaches: Video information retrieval,  Video Question Answering, Human-computer interaction and Video Storytelling. To find out more information about these four applications, please read:

Repost | Video Moment Retrieval to Capture the Highlighting Moments in Video (Part.2)

5. Related open and commercial datasets

As we mentioned above, the definition of VMR is quite different in different circumstances. The multiple datasets for VMR are therefore derived from different settings and have different characteristics.

Video Moment Retrieval (VMR) Datasets

 


 Reference:

[1] https://www.sciencedirect.com/science/article/pii/S2666827022000469

[2] https://support.apple.com/en-us/HT207368 

[3]https://arstechnica.com/information-technology/2022/10/googles-newest-ai-generator-creates-hd-video-from-text-prompts/

[4] https://aclanthology.org/D18-11

Any further information, please contact us.

contact us