| | |
| | |
Stat |
Members: 3665 Articles: 2'599'751 Articles rated: 2609
21 January 2025 |
|
| | | |
|
Article overview
| |
|
Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning | Minghao Zhu
; Xiao Lin
; Ronghao Dang
; Chengju Liu
; Qijun Chen
; | Date: |
1 Sep 2023 | Abstract: | As the most essential property in a video, motion information is critical to
a robust and generalized video representation. To inject motion dynamics,
recent works have adopted frame difference as the source of motion information
in video contrastive learning, considering the trade-off between quality and
cost. However, existing works align motion features at the instance level,
which suffers from spatial and temporal weak alignment across modalities. In
this paper, we present a extbf{Fi}ne-grained extbf{M}otion
extbf{A}lignment (FIMA) framework, capable of introducing well-aligned and
significant motion information. Specifically, we first develop a dense
contrastive learning framework in the spatiotemporal domain to generate
pixel-level motion supervision. Then, we design a motion decoder and a
foreground sampling strategy to eliminate the weak alignments in terms of time
and space. Moreover, a frame-level motion contrastive loss is presented to
improve the temporal diversity of the motion features. Extensive experiments
demonstrate that the representations learned by FIMA possess great
motion-awareness capabilities and achieve state-of-the-art or competitive
results on downstream tasks across UCF101, HMDB51, and Diving48 datasets. Code
is available at url{this https URL}. | Source: | arXiv, 2309.00297 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
|
| |
|
|
|