| | |
| | |
Stat |
Members: 3645 Articles: 2'503'724 Articles rated: 2609
23 April 2024 |
|
| | | |
|
Article overview
| |
|
Co-training Transformer with Videos and Images Improves Action Recognition | Bowen Zhang
; Jiahui Yu
; Christopher Fifty
; Wei Han
; Andrew M. Dai
; Ruoming Pang
; Fei Sha
; | Date: |
14 Dec 2021 | Abstract: | In learning action recognition, models are typically pre-trained on object
recognition with images, such as ImageNet, and later fine-tuned on target
action recognition with videos. This approach has achieved good empirical
performance especially with recent transformer-based video architectures. While
recently many works aim to design more advanced transformer architectures for
action recognition, less effort has been made on how to train video
transformers. In this work, we explore several training paradigms and present
two findings. First, video transformers benefit from joint training on diverse
video datasets and label spaces (e.g., Kinetics is appearance-focused while
SomethingSomething is motion-focused). Second, by further co-training with
images (as single-frame videos), the video transformers learn even better video
representations. We term this approach as Co-training Videos and Images for
Action Recognition (CoVeR). In particular, when pretrained on ImageNet-21K
based on the TimeSFormer architecture, CoVeR improves Kinetics-400 Top-1
Accuracy by 2.4%, Kinetics-600 by 2.3%, and SomethingSomething-v2 by 2.3%. When
pretrained on larger-scale image datasets following previous state-of-the-art,
CoVeR achieves best results on Kinetics-400 (87.2%), Kinetics-600 (87.9%),
Kinetics-700 (79.8%), SomethingSomething-v2 (70.9%), and Moments-in-Time
(46.1%), with a simple spatio-temporal video transformer. | Source: | arXiv, 2112.07175 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
browser Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
|
| |
|
|
|
| News, job offers and information for researchers and scientists:
| |