| | |
| | |
Stat |
Members: 3665 Articles: 2'599'751 Articles rated: 2609
25 January 2025 |
|
| | | |
|
Article overview
| |
|
Towards Contrastive Learning in Music Video Domain | Karel Veldkamp
; Mariya Hendriksen
; Zoltán Szlávik
; Alexander Keijser
; | Date: |
1 Sep 2023 | Abstract: | Contrastive learning is a powerful way of learning multimodal representations
across various domains such as image-caption retrieval and audio-visual
representation learning. In this work, we investigate if these findings
generalize to the domain of music videos. Specifically, we create a dual
en-coder for the audio and video modalities and train it using a bidirectional
contrastive loss. For the experiments, we use an industry dataset containing
550 000 music videos as well as the public Million Song Dataset, and evaluate
the quality of learned representations on the downstream tasks of music tagging
and genre classification. Our results indicate that pre-trained networks
without contrastive fine-tuning outperform our contrastive learning approach
when evaluated on both tasks. To gain a better understanding of the reasons
contrastive learning was not successful for music videos, we perform a
qualitative analysis of the learned representations, revealing why contrastive
learning might have difficulties uniting embeddings from two modalities. Based
on these findings, we outline possible directions for future work. To
facilitate the reproducibility of our results, we share our code and the
pre-trained model. | Source: | arXiv, 2309.00347 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
|
| |
|
|
|