| | |
| | |
Stat |
Members: 3645 Articles: 2'504'928 Articles rated: 2609
26 April 2024 |
|
| | | |
|
Article overview
| |
|
Speaker dependent articulatory-to-acoustic mapping using real-time MRI of the vocal tract | Tamás Gábor Csapó
; | Date: |
3 Aug 2020 | Abstract: | Articulatory-to-acoustic (forward) mapping is a technique to predict speech
using various articulatory acquisition techniques (e.g. ultrasound tongue
imaging, lip video). Real-time MRI (rtMRI) of the vocal tract has not been used
before for this purpose. The advantage of MRI is that it has a high ’relative’
spatial resolution: it can capture not only lingual, labial and jaw motion, but
also the velum and the pharyngeal region, which is typically not possible with
other techniques. In the current paper, we train various DNNs (fully connected,
convolutional and recurrent neural networks) for articulatory-to-speech
conversion, using rtMRI as input, in a speaker-specific way. We use two male
and two female speakers of the USC-TIMIT articulatory database, each of them
uttering 460 sentences. We evaluate the results with objective (Normalized MSE
and MCD) and subjective measures (perceptual test) and show that CNN-LSTM
networks are preferred which take multiple images as input, and achieve MCD
scores between 2.8-4.5 dB. In the experiments, we find that the predictions of
speaker ’m1’ are significantly weaker than other speakers. We show that this is
caused by the fact that 74% of the recordings of speaker ’m1’ are out of sync. | Source: | arXiv, 2008.00889 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
browser Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
|
| |
|
|
|
| News, job offers and information for researchers and scientists:
| |