forgot password?
register here
Research articles
  search articles
  reviews guidelines
  articles index
My Pages
my alerts
  my messages
  my reviews
  my favorites
Members: 3652
Articles: 2'545'386
Articles rated: 2609

24 June 2024
  » arxiv » 2302.00286

 Article overview

Jointist: Simultaneous Improvement of Multi-instrument Transcription and Music Source Separation via Joint Training
Kin Wai Cheuk ; Keunwoo Choi ; Qiuqiang Kong ; Bochen Li ; Minz Won ; Ju-Chiang Wang ; Yun-Ning Hung Dorien Herremans ;
Date 1 Feb 2023
AbstractIn this paper, we introduce Jointist, an instrument-aware multi-instrument framework that is capable of transcribing, recognizing, and separating multiple musical instruments from an audio clip. Jointist consists of an instrument recognition module that conditions the other two modules: a transcription module that outputs instrument-specific piano rolls, and a source separation module that utilizes instrument information and transcription results. The joint training of the transcription and source separation modules serves to improve the performance of both tasks. The instrument module is optional and can be directly controlled by human users. This makes Jointist a flexible user-controllable framework.
Our challenging problem formulation makes the model highly useful in the real world given that modern popular music typically consists of multiple instruments. Its novelty, however, necessitates a new perspective on how to evaluate such a model. In our experiments, we assess the proposed model from various aspects, providing a new evaluation perspective for multi-instrument transcription. Our subjective listening study shows that Jointist achieves state-of-the-art performance on popular music, outperforming existing multi-instrument transcription models such as MT3. %We also argue that transcription models can be used as a preprocessing module for other music analysis tasks. We conducted experiments on several downstream tasks and found that the proposed method improved transcription by more than 1 percentage points (ppt.), source separation by 5 SDR, downbeat detection by 1.8 ppt., chord recognition by 1.4 ppt., and key estimation by 1.4 ppt., when utilizing transcription results obtained from Jointist.
Source arXiv, 2302.00286
Services Forum | Review | PDF | Favorites   
Visitor rating: did you like this article? no 1   2   3   4   5   yes

No review found.
 Did you like this article?

This article or document is ...
of broad interest:
Global appreciation:

  Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
» my Online CV
» Free

home  |  contact  |  terms of use  |  sitemap
Copyright © 2005-2024 - Scimetrica