Video Understanding

TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks

Understanding videos is challenging in computer vision. In particular, the large memory footprint of an untrimmed video makes most tasks infeasible to train endto-end without dropping part of the input data. To cope with the memory limitation of …