Video Grounding and Its Generalization

£129.50

Video Grounding and Its Generalization

From I.D. and Task-specific Models to O.O.D. and Large Foundation Models

Graphical and digital media applications Natural language and machine translation Machine learning Computer vision

Authors: Xin Wang, Xiaohan Lan, Wenwu Zhu

Dinosaur mascot

Collection: Professional and Applied Computing

Language: English

Published by: Springer

Published on: 1st January 2026

Format: LCP-protected ePub

ISBN: 9783031948374


Part I: Methodologies for Video Grounding

This part covers basic and advanced methodologies for Video Grounding, discussing key comparisons with several representative Vision-Language learning tasks including multimodal understanding and generation.

Part II: Generalized Video Grounding and Trending Directions

This section discusses our insights for Generalized Video Grounding and the development of Video Grounding in the era of large foundation models, exploring future directions such as Out-of-Distribution settings which deserve further investigations.

Discussions on Video Grounding

Discussions will cover both the task of Video Grounding and other Vision-Language Tasks, as well as their relations. The basics and advances will touch on Video Grounding from model to benchmark, from supervised learning to unsupervised pre-training, from single video grounding to video corpus grounding, and from in-distribution setting to out-of-distribution setting.

Insights on Generalized Video Grounding

We discuss cross-modal grounding, event grounding for multi-modal tasks, various distribution shifts in out-of-distribution settings, explainable Video Grounding, and the role of large foundation models for Video Grounding.

We deeply hope this book can benefit interested readers from both academia and industry, covering needs from junior starters in research to senior practitioners in IT companies.

Show moreShow less