| | |
| | |
Stat |
Members: 3667 Articles: 2'599'751 Articles rated: 2609
09 February 2025 |
|
| | | |
|
Article overview
| |
|
Jointist: Simultaneous Improvement of Multi-instrument Transcription and Music Source Separation via Joint Training | Kin Wai Cheuk
; Keunwoo Choi
; Qiuqiang Kong
; Bochen Li
; Minz Won
; Ju-Chiang Wang
; Yun-Ning Hung Dorien Herremans
; | Date: |
1 Feb 2023 | Abstract: | In this paper, we introduce Jointist, an instrument-aware multi-instrument
framework that is capable of transcribing, recognizing, and separating multiple
musical instruments from an audio clip. Jointist consists of an instrument
recognition module that conditions the other two modules: a transcription
module that outputs instrument-specific piano rolls, and a source separation
module that utilizes instrument information and transcription results. The
joint training of the transcription and source separation modules serves to
improve the performance of both tasks. The instrument module is optional and
can be directly controlled by human users. This makes Jointist a flexible
user-controllable framework.
Our challenging problem formulation makes the model highly useful in the real
world given that modern popular music typically consists of multiple
instruments. Its novelty, however, necessitates a new perspective on how to
evaluate such a model. In our experiments, we assess the proposed model from
various aspects, providing a new evaluation perspective for multi-instrument
transcription. Our subjective listening study shows that Jointist achieves
state-of-the-art performance on popular music, outperforming existing
multi-instrument transcription models such as MT3. %We also argue that
transcription models can be used as a preprocessing module for other music
analysis tasks. We conducted experiments on several downstream tasks and found
that the proposed method improved transcription by more than 1 percentage
points (ppt.), source separation by 5 SDR, downbeat detection by 1.8 ppt.,
chord recognition by 1.4 ppt., and key estimation by 1.4 ppt., when utilizing
transcription results obtained from Jointist. | Source: | arXiv, 2302.00286 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
|
| |
|
|
|