| | |
| | |
Stat |
Members: 3643 Articles: 2'488'730 Articles rated: 2609
29 March 2024 |
|
| | | |
|
Article overview
| |
|
RNN-based speech synthesis using a continuous sinusoidal model | Mohammed Salah Al-Radhi
; Tamás Gábor Csapó
; Géza Németh
; | Date: |
12 Apr 2019 | Abstract: | Recently in statistical parametric speech synthesis, we proposed a continuous
sinusoidal model (CSM) using continuous F0 (contF0) in combination with Maximum
Voiced Frequency (MVF), which was successfully giving state-of-the-art vocoders
performance (e.g. similar to STRAIGHT) in synthesized speech. In this paper, we
address the use of sequence-to-sequence modeling with recurrent neural networks
(RNNs). Bidirectional long short-term memory (Bi-LSTM) is investigated and
applied using our CSM to model contF0, MVF, and Mel-Generalized Cepstrum (MGC)
for more natural sounding synthesized speech. For refining the output of the
contF0 estimation, post-processing based on time-warping approach is applied to
reduce the unwanted voiced component of the unvoiced speech sounds, resulting
in an enhanced contF0 track. The overall conclusion is covered by objective
evaluation and subjective listening test, showing that the proposed framework
provides satisfactory results in terms of naturalness and intelligibility, and
is comparable to the high-quality WORLD model based RNNs. | Source: | arXiv, 1904.6075 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
browser claudebot
|
| |
|
|
|
| News, job offers and information for researchers and scientists:
| |